openmoa.classifiers
Complete API reference for the 10 original classifiers introduced in OpenMOA. These algorithms address challenging real-world stream learning scenarios including dynamic feature spaces, sparse data, mixed data types, semi-supervised learning, and deep lifelong learning.
§0 Quick Overview
| Classifier | Algorithm | Task | Input | PyTorch |
|---|---|---|---|---|
| FESLClassifier | Feature Evolvable SL NeurIPS 2017 | Binary | Sparse | No |
| OASFClassifier | Online Active Sparse Features 2024 | Binary | Sparse | No |
| RSOLClassifier | Robust Sparse Online Learning SDM 2024 | Binary | Sparse | No |
| FOBOSClassifier | Forward-Backward Splitting JMLR 2009 | Binary / Multi | Sparse | No |
| FTRLClassifier | Follow the Regularized Leader AISTATS 2011 | Binary / Multi | Sparse | No |
| OVFMClassifier | Variable Feature Spaces, Mixed Data ICDM 2021 | Binary | Dense | No |
| OSLMFClassifier | Semi-supervised Mixed Features AAAI 2023 | Binary | Dense | No |
| ORF3VClassifier | Online Random Feature Forests AAAI 2023 | Multi-class | Dense / Sparse | No |
| OLD3SClassifier | Deep Lifelong Learning TKDE 2024 | Binary / Multi | Dense | Yes |
| OWSSClassifier | Open-World Soft Sensing ICDM 2024 | Multi-class | Dense | Yes |
§1 FESLClassifier
Full name: Feature Evolvable Streaming Learning · Hou et al., NeurIPS 2017
FESL detects feature-space shifts by measuring the Jaccard distance between consecutive active feature index sets (threshold < 0.8). It maintains two sparse linear models (w_curr, w_old). During a transition, it accumulates instances in an overlap buffer and learns a linear mapping matrix M (via Ridge Regression) projecting new features onto old ones. Final prediction is a weighted ensemble: y = μ_curr · f_curr(x) + μ_old · f_old(M·x).
FESLClassifier(
schema: Schema,
alpha: float = 0.1,
lambda_: float = 0.1,
window_size: int = 100,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| alpha | float | 0.1 | Learning rate for logistic gradient updates |
| lambda_ | float | 0.1 | L2 regularization strength for mapping matrix Ridge Regression |
| window_size | int | 100 | Overlap buffer size — instances to accumulate before learning the mapping matrix |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| w_curr | dict | Sparse current model weights {feature_id: weight} |
| w_old | dict | Sparse previous model weights |
| M_struct | dict | None | Mapping matrix: {'matrix': ndarray(D_new, D_old), 'new_map': {...}, 'old_ids': [...]} |
| current_indices_set | set | Active feature indices in current stage |
| overlap_buffer | list | Buffered sparse feature dicts during transition |
| mu_curr | float | Ensemble weight for current model (initialized 0.5) |
| mu_old | float | Ensemble weight for old model (initialized 0.5) |
| t | int | Instance counter |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Detects feature shift via Jaccard similarity; buffers instances during overlap; learns M when buffer is full; updates w_curr via logistic gradient; updates ensemble weights via log-loss |
| predict | (instance: Instance) → int | int | Returns 0 or 1; threshold at predict_proba()[1] > 0.5 |
| predict_proba | (instance: Instance) → ndarray | ndarray (2,) | Returns [P(y=0), P(y=1)] via logistic sigmoid of ensemble logit |
§2 OASFClassifier
Full name: Online Active Sparse Feature Learning · Chen et al., 2024
OASF uses Passive-Aggressive (PA) updates for incremental and decremental feature spaces (Theorems 1 & 2). It maintains a ring-buffer weight matrix W of size (n_features, L), applying L₁,₂-norm group sparsity shrinkage on each update. The ring buffer avoids expensive full-matrix copies. W auto-expands to handle arbitrarily high-dimensional streams.
OASFClassifier(
schema: Schema,
lambda_param: float = 0.01,
mu: float = 1.0,
L: int = 100,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| lambda_param | float | 0.01 | L₁,₂ shrinkage threshold — controls sparsity level |
| mu | float | 1.0 | PA margin parameter — aggressiveness of weight updates |
| L | int | 100 | Sliding window length — number of weight vector columns |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| W | ndarray (n_features, L) | Ring buffer of weight vectors |
| _ptr | int | Ring-buffer write pointer (0 to L-1) |
| current_dim | int | Current feature dimensionality |
| t | int | Instance counter |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | PA update w_new = w_s + γ·y·x_t; applies L₁,₂ sparsity shrinkage; advances ring pointer; auto-expands W if needed |
| predict | (instance: Instance) → int | int | Returns 0 or 1 |
| predict_proba | (instance: Instance) → ndarray | ndarray (2,) | Returns [P(y=0), P(y=1)] using latest weight column + logistic sigmoid |
| get_sparsity | () → float | float | Fraction of near-zero values in latest weight column; range [0, 1] |
§3 RSOLClassifier
Full name: Robust Sparse Online Learning · Chen et al., SDM 2024
RSOL shares the same ring-buffer and L₁,₂ sparsity architecture as OASF but uses a significantly larger default lambda_param (50.0 vs. 0.01) for stronger sparsity, and a larger default window L=1000. Designed for robustness to feature-space changes with emphasis on aggressive feature pruning.
RSOLClassifier(
schema: Schema,
lambda_param: float = 50.0,
mu: float = 1.0,
L: int = 1000,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| lambda_param | float | 50.0 | L₁,₂ shrinkage threshold (much higher than OASF — stronger sparsity) |
| mu | float | 1.0 | PA margin parameter |
| L | int | 1000 | Sliding window length |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| W | ndarray (n_features, L) | Ring buffer of weight vectors |
| _ptr | int | Ring-buffer write pointer |
| current_dim | int | Current feature dimensionality |
| t | int | Instance counter |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | PA update; L₁,₂ sparsity shrinkage; auto-expands W |
| predict | (instance: Instance) → int | int | Returns 0 or 1 |
| predict_proba | (instance: Instance) → ndarray | ndarray (2,) | Returns [P(y=0), P(y=1)] |
| get_sparsity | () → float | float | Fraction of near-zero weights in latest column |
§4 FOBOSClassifier
Full name: Forward-Backward Splitting · Duchi & Singer, JMLR 2009
FOBOS combines a forward gradient step with a proximal operator for online regularization. Supports L1 (sparse), L2 (ridge), L1+L2 (elastic net), or no regularization. Supports binary (logistic loss) and multi-class (softmax loss). Weight matrix auto-expands for dynamic feature streams. Learning rate decays as α/√t (sqrt schedule) or α/t (linear schedule).
FOBOSClassifier(
schema: Schema,
alpha: float = 1.0,
lambda_: float = 0.001,
regularization: str = "l1",
step_schedule: str = "sqrt",
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| alpha | float | 1.0 | Base learning rate |
| lambda_ | float | 0.001 | Regularization strength |
| regularization | str | "l1" | Regularization type: "l1" (L1 soft-threshold), "l2" (scaled), "l1_l2" (group), "none" |
| step_schedule | str | "sqrt" | Learning rate decay: "sqrt" → η = α/√t; "linear" → η = α/t |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| W | ndarray (n_features, n_outputs) | Weight matrix; n_outputs=1 for binary, n_classes for multi-class |
| n_classes | int | Number of classes |
| task_type | str | "binary" or "multiclass" |
| n_features | int | Current feature count |
| t | int | Instance counter |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Gradient step (logistic/softmax); applies proximal operator; auto-expands W |
| predict | (instance: Instance) → int | int | Returns 0/1 (binary) or argmax (multi-class) |
| predict_proba | (instance: Instance) → ndarray | ndarray | Binary: (2,); Multi-class: (n_classes,) softmax probabilities |
§5 FTRLClassifier
Full name: Follow the Regularized Leader — Proximal · McMahan, AISTATS 2011
FTRL-Proximal maintains per-coordinate adaptive learning rates via three arrays: z (gradient accumulator), n (squared-gradient sum), and w (weights). The per-coordinate learning rate is (β + √n[i]) / α + L2, enabling features updated rarely to receive larger steps. L1 sparsification: features where |z[i]| ≤ L1 are zeroed. Arrays auto-expand for dynamic feature streams.
FTRLClassifier(
schema: Schema,
alpha: float = 0.1,
beta: float = 1.0,
l1: float = 1.0,
l2: float = 1.0,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| alpha | float | 0.1 | FTRL learning rate coefficient |
| beta | float | 1.0 | Per-coordinate adaptation term (smoothing) |
| l1 | float | 1.0 | L1 threshold — features with |z| ≤ l1 are set to zero |
| l2 | float | 1.0 | L2 regularization coefficient |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| z | ndarray (n_features, n_outputs) | Gradient accumulator (float64) |
| n | ndarray (n_features, n_outputs) | Squared-gradient sum (float64) |
| w | ndarray (n_features, n_outputs) | Current weight matrix (float64) |
| n_classes | int | Number of classes |
| task_type | str | "binary" or "multiclass" |
| n_features | int | Current feature count |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | FTRL core update: n_new = n + grad²; σ = (√n_new − √n)/α; z_new = z + grad − σ·w; proximal step: w[i] = −(z[i]−sign(z[i])·l1)/((β+√n[i])/α+l2) if |z[i]| > l1 else 0 |
| predict | (instance: Instance) → int | int | Binary: 1 if linear_pred > 0 else 0; Multi-class: argmax |
| predict_proba | (instance: Instance) → ndarray | ndarray | Binary: (2,); Multi-class: softmax (n_classes,) |
| get_sparsity | () → float | float | Fraction of near-zero weights (|w| < 1e-10) |
§6 OVFMClassifier
Full name: Online Learning in Variable Feature Spaces with Mixed Data · He et al., ICDM 2021
OVFM handles streams with mixed feature types (continuous + ordinal) and missing values caused by evolving feature spaces. It uses a Gaussian Copula EM algorithm to transform mixed data into a shared latent space, then trains dual classifiers: w_obs (observed space) and w_lat (latent space). Batch-based EM updates run every batch_size instances. The ensemble weight between the two classifiers is updated exponentially based on cumulative loss.
OVFMClassifier(
schema: Schema,
window_size: int = 200,
batch_size: int = 50,
evolution_pattern: str = "vfs",
decay_coef: float = 0.5,
num_ord_updates: int = 2,
max_ord_levels: int = 14,
ensemble_weight: float = 0.5,
learning_rate: float = 0.01,
l1_lambda: float = 0.0,
l2_lambda: float = 0.01,
sparsity_threshold: float = 0.01,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| window_size | int | 200 | Gaussian Copula ECDF sliding window size |
| batch_size | int | 50 | Instances to accumulate before each EM + SGD update |
| evolution_pattern | str | "vfs" | Feature evolution type: "vfs", "tds", "cds", "eds" |
| decay_coef | float | 0.5 | Exponential decay factor for online covariance update |
| num_ord_updates | int | 2 | Ordinal imputation iterations per E-step |
| max_ord_levels | int | 14 | Features with ≤ max_ord_levels unique values are treated as ordinal |
| ensemble_weight | float | 0.5 | Initial ensemble weight for observed-space classifier |
| learning_rate | float | 0.01 | SGD learning rate for both classifiers |
| l1_lambda | float | 0.0 | L1 regularization strength |
| l2_lambda | float | 0.01 | L2 regularization strength |
| sparsity_threshold | float | 0.01 | Weights below this are zeroed (sparsification) |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| _w_obs | ndarray (d+1,) | Observed-space weights with bias |
| _w_lat | ndarray (d+1,) | Latent-space weights with bias |
| _sigma | ndarray (d, d) | Correlation matrix from Gaussian Copula |
| _cont_indices | ndarray bool | Mask for continuous features |
| _ord_indices | ndarray bool | Mask for ordinal features |
| _transform_function | OnlineTransformFunction | Marginal ECDF/quantile helper |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Accumulates instances; every batch_size: E-step (impute latent Z) → M-step (update Σ) → SGD on both classifiers; updates feature type masks; updates ensemble weight; sparsifies weights |
| predict | (instance: Instance) → int | int | Returns 1 if predict_proba()[1] > 0.5 |
| predict_proba | (instance: Instance) → ndarray | ndarray (2,) | Transforms x to latent z via copula; ensemble prediction: w_ens·score_obs + (1−w_ens)·score_lat; applies logistic sigmoid |
§7 OSLMFClassifier
Full name: Online Semi-supervised Learning with Mix-Typed Streaming Features · Wu et al., AAAI 2023
OSLMF extends OVFM with semi-supervised label propagation. It uses a Gaussian Copula for mixed-data imputation and a Density Peak Clustering algorithm to propagate labels from labeled to unlabeled instances within the buffer. Dual classifiers (w_obs, w_lat) are trained with pseudo-labels from density-peak assignment.
OSLMFClassifier(
schema: Schema,
window_size: int = 200,
buffer_size: int = 200,
batch_size: int = 50,
learning_rate: float = 0.01,
decay_coef: float = 0.5,
max_ord_levels: int = 14,
ensemble_weight: float = 0.5,
l2_lambda: float = 0.001,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| window_size | int | 200 | Gaussian Copula ECDF window size |
| buffer_size | int | 200 | Density-Peak buffer size (instances retained for label propagation) |
| batch_size | int | 50 | Batch size for EM updates |
| learning_rate | float | 0.01 | SGD learning rate |
| decay_coef | float | 0.5 | Covariance exponential decay factor |
| max_ord_levels | int | 14 | Ordinal detection threshold |
| ensemble_weight | float | 0.5 | Initial ensemble weight |
| l2_lambda | float | 0.001 | L2 regularization strength |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| _copula | GaussianCopula | Gaussian Copula transformer for ECDF normalization |
| _density_peaks | DensityPeakClustering | Label propagation via density-peak structure |
| _w_obs | ndarray (d+1,) | Observed-space weights with bias |
| _w_lat | ndarray (d+1,) | Latent-space weights with bias |
| _cont_indices | ndarray bool | Continuous feature mask |
| _ord_indices | ndarray bool | Ordinal feature mask |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Updates Copula window (per-instance); accumulates batch; when full: copula EM + density-peak label propagation + SGD on both classifiers |
| predict | (instance: Instance) → int | int | Returns 1 if predict_proba()[1] > 0.5 |
| predict_proba | (instance: Instance) → ndarray | ndarray (2,) | Transforms to latent z, reconstructs missing features; ensemble prediction |
§8 ORF3VClassifier
Full name: Online Random Feature Forests for Varying Feature Spaces · Schreckenberger et al., AAAI 2023
ORF3V builds an independent decision stump forest for each feature. Per-feature forests predict class distributions, weighted by each feature's accuracy history. Handles dynamic feature spaces naturally — forests are initialized lazily after grace_period instances and new feature forests are created when new features appear. Stumps are periodically replaced by newly generated candidates with better Gini gain.
ORF3VClassifier(
schema: Schema,
n_stumps: int = 10,
alpha: float = 0.1,
grace_period: int = 100,
replacement_interval: int = 100,
replacement_strategy: str = "oldest",
window_size: int = 200,
delta: float = 0.001,
compression: float = 1000,
d_max: int = 1000,
enable_pruning: bool = False,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| n_stumps | int | 10 | Number of decision stumps per feature forest |
| alpha | float | 0.1 | Weight decay rate for per-forest accuracy updates |
| grace_period | int | 100 | Instances to observe before initializing forests |
| replacement_interval | int | 100 | Instances between stump replacement rounds |
| replacement_strategy | str | "oldest" | Stump replacement strategy: "oldest" or "random" |
| window_size | int | 200 | Sliding window for feature presence tracking |
| delta | float | 0.001 | Hoeffding bound confidence parameter (used if enable_pruning=True) |
| compression | float | 1000 | t-digest compression parameter (reserved for future use) |
| d_max | int | 1000 | Maximum expected number of features |
| enable_pruning | bool | False | Enable Hoeffding-bound-based feature pruning |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| feature_forests | dict | {feature_id: FeatureForest} — one forest per observed feature |
| weights | dict | {feature_id: float} — per-forest prediction weight |
| feature_stats | dict | {feature_id: FeatureStats} — per-feature statistics for stump generation |
| first_occurrence | dict | {feature_id: int} — time step of first appearance |
| t | int | Instance counter |
| n_classes | int | Number of classes |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Updates FeatureStats; initializes forests after grace period; replaces stumps every replacement_interval; creates new forests for newly observed features; updates per-forest weights |
| predict | (instance: Instance) → int | int | Returns class with highest aggregate score |
| predict_proba | (instance: Instance) → ndarray | ndarray (n_classes,) | Normalizes aggregate class scores: score = Σ weight[f] · forest[f].predict(x[f]) |
instance.feature_indices attribute if present (set by OpenFeatureStream).§9 OLD3SClassifier
Full name: Online Learning Deep models from Data of Double Streams · Lian et al., IEEE TKDE 2024
OLD3S supports lifelong learning across multiple sequential feature-space transitions. It maintains a Variational Autoencoder (VAE) for feature extraction and a Hedge Backpropagation (HBP) MLP for classification. A reactive state machine detects three phases — STABLE, OVERLAP, STABLE_NEW — and applies knowledge distillation during OVERLAP to align new and old latent spaces. Ensemble weights between current and previous models are updated exponentially based on prediction loss.
OLD3SClassifier(
schema: Schema,
latent_dim: int = 20,
hidden_dim: int = 128,
num_hbp_layers: int = 3,
learning_rate: float = 0.001,
beta: float = 0.99,
eta: float = 0.01,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| latent_dim | int | 20 | VAE bottleneck dimensionality |
| hidden_dim | int | 128 | Hidden units in VAE encoder/decoder and HBP MLP |
| num_hbp_layers | int | 3 | Number of HBP MLP layers (each has its own exit classifier) |
| learning_rate | float | 0.001 | Adam optimizer learning rate |
| beta | float | 0.99 | HBP weight decay rate — controls contribution of later layers |
| eta | float | 0.01 | Ensemble update rate — speed of weight shift between current and previous models |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| model_curr | dict | None | Current model bundle: {vae, clf, opt_vae, opt_clf, hbp_weights, dim} |
| model_prev | dict | None | Previous model bundle (active during OVERLAP phase) |
| w_curr | float | Ensemble weight for current model |
| w_prev | float | Ensemble weight for previous model |
| curr_indices | ndarray | None | Feature indices for current space |
| prev_indices | ndarray | None | Feature indices for previous space |
| is_overlap | bool | True during OVERLAP transition phase |
| device | torch.device | 'cuda' if available, else 'cpu' |
| Component | Architecture |
|---|---|
| VAE_Shallow | Encoder: Linear(d, hidden) + ReLU → μ, logvar; Linear(hidden, latent) · Decoder: Linear(latent, hidden) + ReLU + Linear(hidden, d) + Sigmoid |
| HBPMLP | Multi-layer ReLU network; exit classifier at each layer; forward returns list of logits |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Detects phase (STABLE / OVERLAP / STABLE_NEW); trains appropriate models; applies alignment loss during OVERLAP; updates ensemble weights |
| predict | (instance: Instance) → int | int | Returns argmax of predict_proba() |
| predict_proba | (instance: Instance) → ndarray | ndarray (n_classes,) | Encodes features via VAE → HBP prediction; ensemble with model_prev during OVERLAP |
| State | Trigger | Behavior |
|---|---|---|
| STABLE | No previous model or stable feature set | Train model_curr only (VAE + classifier) |
| OVERLAP | New feature set is superset of current | model_prev ← model_curr; create new model_curr; train both; apply KD alignment loss |
| STABLE_NEW | Feature set shrinks below prev+curr size | model_prev ← None; return to STABLE |
§10 OWSSClassifier
Full name: Online Learning from Open-World Soft Sensing · Lian et al., ICDM 2024
OWSS uses a bipartite graph neural network (GNN) where nodes represent features and instances, and edges encode feature-instance relationships. Learnable feature embeddings capture universal feature representations shared across instances. A Feature Reconstruction Loss (Eq. 2) aligns the latent GNN output back toward the initial embedding, preventing representational drift. Updates are batch-based using a sliding window buffer.
OWSSClassifier(
schema: Schema,
window_size: int = 100,
hidden_dim: int = 32,
learning_rate: float = 0.01,
rec_weight: float = 0.1,
sparsity_threshold: float = 0.05,
random_seed: int = 1,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
| schema | Schema | required | Stream schema |
| window_size | int | 100 | Buffer size before each GNN update |
| hidden_dim | int | 32 | Feature embedding dimensionality (GNN hidden size) |
| learning_rate | float | 0.01 | Adam optimizer learning rate |
| rec_weight | float | 0.1 | β: weight of reconstruction loss relative to classification loss |
| sparsity_threshold | float | 0.05 | Edge pruning threshold — feature-instance edges where |X_ij| < threshold are removed from the graph |
| random_seed | int | 1 | RNG seed |
| Attribute | Type | Description |
|---|---|---|
| model | OWSSNetwork | Bipartite GNN model |
| optimizer | optim.Adam | Model optimizer |
| n_features | int | Current feature count |
| max_features | int | Allocated embedding pool size: max(1000, 2 · n_features) |
| device | torch.device | 'cuda' or 'cpu' |
| stats_min | ndarray | None | Online per-feature minimum (for normalization) |
| stats_max | ndarray | None | Online per-feature maximum |
| Component | Architecture |
|---|---|
| feature_embeddings | nn.Parameter shape (max_features, hidden_dim) — learnable universal feature representations |
| input_projector | nn.Linear(1, hidden_dim) — projects scalar feature values to embedding space |
| BipartiteGraphConv | Θ ∈ (hidden, hidden); forward: ReLU(adj @ x @ Θ) + dropout |
| classifier | Linear(hidden, hidden//2) + ReLU + Linear(hidden//2, n_classes) |
| Method | Signature | Returns | Description |
|---|---|---|---|
| train | (instance: Instance) → None | None | Accumulates in buffer; when buffer ≥ window_size: normalizes, builds bipartite adjacency, forward pass, computes loss_cls + rec_weight · loss_rec, backpropagates |
| predict | (instance: Instance) → int | int | Returns argmax of predict_proba() |
| predict_proba | (instance: Instance) → ndarray | ndarray (n_classes,) | Normalizes input; builds adjacency; GNN forward pass; softmax |
L_total = CrossEntropyLoss(logits, y) + rec_weight · MSELoss(latent_z, initial_z.detach())
Where initial_z = instance nodes before GNN refinement, and latent_z = after. The reconstruction term (Eq. 2) prevents feature embeddings from drifting arbitrarily.