API Reference 10 Original Algorithms

openmoa.classifiers

Complete API reference for the 10 original classifiers introduced in OpenMOA. These algorithms address challenging real-world stream learning scenarios including dynamic feature spaces, sparse data, mixed data types, semi-supervised learning, and deep lifelong learning.

§0   Quick Overview

Classifier Algorithm Task Input PyTorch
FESLClassifier Feature Evolvable SL NeurIPS 2017 Binary Sparse No
OASFClassifier Online Active Sparse Features 2024 Binary Sparse No
RSOLClassifier Robust Sparse Online Learning SDM 2024 Binary Sparse No
FOBOSClassifier Forward-Backward Splitting JMLR 2009 Binary / Multi Sparse No
FTRLClassifier Follow the Regularized Leader AISTATS 2011 Binary / Multi Sparse No
OVFMClassifier Variable Feature Spaces, Mixed Data ICDM 2021 Binary Dense No
OSLMFClassifier Semi-supervised Mixed Features AAAI 2023 Binary Dense No
ORF3VClassifier Online Random Feature Forests AAAI 2023 Multi-class Dense / Sparse No
OLD3SClassifier Deep Lifelong Learning TKDE 2024 Binary / Multi Dense Yes
OWSSClassifier Open-World Soft Sensing ICDM 2024 Multi-class Dense Yes
CapyMOA classifiers: OpenMOA also exposes all CapyMOA classifiers (HoeffdingTree, NaiveBayes, KNN, SAMkNN, ARF, etc.) via the same interface. See the CapyMOA API Reference for their full documentation.

§1   FESLClassifier

class FESLClassifier binary sparse · src/openmoa/classifier/_fesl_classifier.py

Full name: Feature Evolvable Streaming Learning  ·  Hou et al., NeurIPS 2017

FESL detects feature-space shifts by measuring the Jaccard distance between consecutive active feature index sets (threshold < 0.8). It maintains two sparse linear models (w_curr, w_old). During a transition, it accumulates instances in an overlap buffer and learns a linear mapping matrix M (via Ridge Regression) projecting new features onto old ones. Final prediction is a weighted ensemble: y = μ_curr · f_curr(x) + μ_old · f_old(M·x).

Constructor
signature
FESLClassifier(
    schema: Schema,
    alpha: float = 0.1,
    lambda_: float = 0.1,
    window_size: int = 100,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
alphafloat0.1Learning rate for logistic gradient updates
lambda_float0.1L2 regularization strength for mapping matrix Ridge Regression
window_sizeint100Overlap buffer size — instances to accumulate before learning the mapping matrix
random_seedint1RNG seed
Attributes
AttributeTypeDescription
w_currdictSparse current model weights {feature_id: weight}
w_olddictSparse previous model weights
M_structdict | NoneMapping matrix: {'matrix': ndarray(D_new, D_old), 'new_map': {...}, 'old_ids': [...]}
current_indices_setsetActive feature indices in current stage
overlap_bufferlistBuffered sparse feature dicts during transition
mu_currfloatEnsemble weight for current model (initialized 0.5)
mu_oldfloatEnsemble weight for old model (initialized 0.5)
tintInstance counter
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneDetects feature shift via Jaccard similarity; buffers instances during overlap; learns M when buffer is full; updates w_curr via logistic gradient; updates ensemble weights via log-loss
predict(instance: Instance) → intintReturns 0 or 1; threshold at predict_proba()[1] > 0.5
predict_proba(instance: Instance) → ndarrayndarray (2,)Returns [P(y=0), P(y=1)] via logistic sigmoid of ensemble logit
Constraints: Binary classification only. Sparse input required. Computing M requires a dense (D_new × D_old) matrix — may cause MemoryError on very high-dimensional streams.

§2   OASFClassifier

class OASFClassifier binary sparse · src/openmoa/classifier/_oasf_classifier.py

Full name: Online Active Sparse Feature Learning  ·  Chen et al., 2024

OASF uses Passive-Aggressive (PA) updates for incremental and decremental feature spaces (Theorems 1 & 2). It maintains a ring-buffer weight matrix W of size (n_features, L), applying L₁,₂-norm group sparsity shrinkage on each update. The ring buffer avoids expensive full-matrix copies. W auto-expands to handle arbitrarily high-dimensional streams.

Constructor
OASFClassifier(
    schema: Schema,
    lambda_param: float = 0.01,
    mu: float = 1.0,
    L: int = 100,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
lambda_paramfloat0.01L₁,₂ shrinkage threshold — controls sparsity level
mufloat1.0PA margin parameter — aggressiveness of weight updates
Lint100Sliding window length — number of weight vector columns
random_seedint1RNG seed
Attributes
AttributeTypeDescription
Wndarray (n_features, L)Ring buffer of weight vectors
_ptrintRing-buffer write pointer (0 to L-1)
current_dimintCurrent feature dimensionality
tintInstance counter
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNonePA update w_new = w_s + γ·y·x_t; applies L₁,₂ sparsity shrinkage; advances ring pointer; auto-expands W if needed
predict(instance: Instance) → intintReturns 0 or 1
predict_proba(instance: Instance) → ndarrayndarray (2,)Returns [P(y=0), P(y=1)] using latest weight column + logistic sigmoid
get_sparsity() → floatfloatFraction of near-zero values in latest weight column; range [0, 1]
Constraints: Binary classification only. Sparse input required.

§3   RSOLClassifier

class RSOLClassifier binary sparse · src/openmoa/classifier/_rsol_classifier.py

Full name: Robust Sparse Online Learning  ·  Chen et al., SDM 2024

RSOL shares the same ring-buffer and L₁,₂ sparsity architecture as OASF but uses a significantly larger default lambda_param (50.0 vs. 0.01) for stronger sparsity, and a larger default window L=1000. Designed for robustness to feature-space changes with emphasis on aggressive feature pruning.

Constructor
RSOLClassifier(
    schema: Schema,
    lambda_param: float = 50.0,
    mu: float = 1.0,
    L: int = 1000,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
lambda_paramfloat50.0L₁,₂ shrinkage threshold (much higher than OASF — stronger sparsity)
mufloat1.0PA margin parameter
Lint1000Sliding window length
random_seedint1RNG seed
Attributes
AttributeTypeDescription
Wndarray (n_features, L)Ring buffer of weight vectors
_ptrintRing-buffer write pointer
current_dimintCurrent feature dimensionality
tintInstance counter
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNonePA update; L₁,₂ sparsity shrinkage; auto-expands W
predict(instance: Instance) → intintReturns 0 or 1
predict_proba(instance: Instance) → ndarrayndarray (2,)Returns [P(y=0), P(y=1)]
get_sparsity() → floatfloatFraction of near-zero weights in latest column
Constraints: Binary classification only. Sparse input required.

§4   FOBOSClassifier

class FOBOSClassifier binary / multi sparse · src/openmoa/classifier/_fobos_classifier.py

Full name: Forward-Backward Splitting  ·  Duchi & Singer, JMLR 2009

FOBOS combines a forward gradient step with a proximal operator for online regularization. Supports L1 (sparse), L2 (ridge), L1+L2 (elastic net), or no regularization. Supports binary (logistic loss) and multi-class (softmax loss). Weight matrix auto-expands for dynamic feature streams. Learning rate decays as α/√t (sqrt schedule) or α/t (linear schedule).

Constructor
FOBOSClassifier(
    schema: Schema,
    alpha: float = 1.0,
    lambda_: float = 0.001,
    regularization: str = "l1",
    step_schedule: str = "sqrt",
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
alphafloat1.0Base learning rate
lambda_float0.001Regularization strength
regularizationstr"l1"Regularization type: "l1" (L1 soft-threshold), "l2" (scaled), "l1_l2" (group), "none"
step_schedulestr"sqrt"Learning rate decay: "sqrt" → η = α/√t;  "linear" → η = α/t
random_seedint1RNG seed
Attributes
AttributeTypeDescription
Wndarray (n_features, n_outputs)Weight matrix; n_outputs=1 for binary, n_classes for multi-class
n_classesintNumber of classes
task_typestr"binary" or "multiclass"
n_featuresintCurrent feature count
tintInstance counter
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneGradient step (logistic/softmax); applies proximal operator; auto-expands W
predict(instance: Instance) → intintReturns 0/1 (binary) or argmax (multi-class)
predict_proba(instance: Instance) → ndarrayndarrayBinary: (2,); Multi-class: (n_classes,) softmax probabilities

§5   FTRLClassifier

class FTRLClassifier binary / multi sparse · src/openmoa/classifier/_ftrl_classifier.py

Full name: Follow the Regularized Leader — Proximal  ·  McMahan, AISTATS 2011

FTRL-Proximal maintains per-coordinate adaptive learning rates via three arrays: z (gradient accumulator), n (squared-gradient sum), and w (weights). The per-coordinate learning rate is (β + √n[i]) / α + L2, enabling features updated rarely to receive larger steps. L1 sparsification: features where |z[i]| ≤ L1 are zeroed. Arrays auto-expand for dynamic feature streams.

Constructor
FTRLClassifier(
    schema: Schema,
    alpha: float = 0.1,
    beta: float = 1.0,
    l1: float = 1.0,
    l2: float = 1.0,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
alphafloat0.1FTRL learning rate coefficient
betafloat1.0Per-coordinate adaptation term (smoothing)
l1float1.0L1 threshold — features with |z| ≤ l1 are set to zero
l2float1.0L2 regularization coefficient
random_seedint1RNG seed
Attributes
AttributeTypeDescription
zndarray (n_features, n_outputs)Gradient accumulator (float64)
nndarray (n_features, n_outputs)Squared-gradient sum (float64)
wndarray (n_features, n_outputs)Current weight matrix (float64)
n_classesintNumber of classes
task_typestr"binary" or "multiclass"
n_featuresintCurrent feature count
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneFTRL core update: n_new = n + grad²; σ = (√n_new − √n)/α; z_new = z + grad − σ·w; proximal step: w[i] = −(z[i]−sign(z[i])·l1)/((β+√n[i])/α+l2) if |z[i]| > l1 else 0
predict(instance: Instance) → intintBinary: 1 if linear_pred > 0 else 0; Multi-class: argmax
predict_proba(instance: Instance) → ndarrayndarrayBinary: (2,); Multi-class: softmax (n_classes,)
get_sparsity() → floatfloatFraction of near-zero weights (|w| < 1e-10)

§6   OVFMClassifier

class OVFMClassifier binary dense · src/openmoa/classifier/_ovfm_classifier.py

Full name: Online Learning in Variable Feature Spaces with Mixed Data  ·  He et al., ICDM 2021

OVFM handles streams with mixed feature types (continuous + ordinal) and missing values caused by evolving feature spaces. It uses a Gaussian Copula EM algorithm to transform mixed data into a shared latent space, then trains dual classifiers: w_obs (observed space) and w_lat (latent space). Batch-based EM updates run every batch_size instances. The ensemble weight between the two classifiers is updated exponentially based on cumulative loss.

Constructor
OVFMClassifier(
    schema: Schema,
    window_size: int = 200,
    batch_size: int = 50,
    evolution_pattern: str = "vfs",
    decay_coef: float = 0.5,
    num_ord_updates: int = 2,
    max_ord_levels: int = 14,
    ensemble_weight: float = 0.5,
    learning_rate: float = 0.01,
    l1_lambda: float = 0.0,
    l2_lambda: float = 0.01,
    sparsity_threshold: float = 0.01,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
window_sizeint200Gaussian Copula ECDF sliding window size
batch_sizeint50Instances to accumulate before each EM + SGD update
evolution_patternstr"vfs"Feature evolution type: "vfs", "tds", "cds", "eds"
decay_coeffloat0.5Exponential decay factor for online covariance update
num_ord_updatesint2Ordinal imputation iterations per E-step
max_ord_levelsint14Features with ≤ max_ord_levels unique values are treated as ordinal
ensemble_weightfloat0.5Initial ensemble weight for observed-space classifier
learning_ratefloat0.01SGD learning rate for both classifiers
l1_lambdafloat0.0L1 regularization strength
l2_lambdafloat0.01L2 regularization strength
sparsity_thresholdfloat0.01Weights below this are zeroed (sparsification)
random_seedint1RNG seed
Attributes
AttributeTypeDescription
_w_obsndarray (d+1,)Observed-space weights with bias
_w_latndarray (d+1,)Latent-space weights with bias
_sigmandarray (d, d)Correlation matrix from Gaussian Copula
_cont_indicesndarray boolMask for continuous features
_ord_indicesndarray boolMask for ordinal features
_transform_functionOnlineTransformFunctionMarginal ECDF/quantile helper
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneAccumulates instances; every batch_size: E-step (impute latent Z) → M-step (update Σ) → SGD on both classifiers; updates feature type masks; updates ensemble weight; sparsifies weights
predict(instance: Instance) → intintReturns 1 if predict_proba()[1] > 0.5
predict_proba(instance: Instance) → ndarrayndarray (2,)Transforms x to latent z via copula; ensemble prediction: w_ens·score_obs + (1−w_ens)·score_lat; applies logistic sigmoid
Constraints: Binary classification only. Dense input. Covariance matrix is O(d²) — unsuitable for very high-dimensional streams.

§7   OSLMFClassifier

class OSLMFClassifier binary dense · src/openmoa/classifier/_oslmf_classifier.py

Full name: Online Semi-supervised Learning with Mix-Typed Streaming Features  ·  Wu et al., AAAI 2023

OSLMF extends OVFM with semi-supervised label propagation. It uses a Gaussian Copula for mixed-data imputation and a Density Peak Clustering algorithm to propagate labels from labeled to unlabeled instances within the buffer. Dual classifiers (w_obs, w_lat) are trained with pseudo-labels from density-peak assignment.

Constructor
OSLMFClassifier(
    schema: Schema,
    window_size: int = 200,
    buffer_size: int = 200,
    batch_size: int = 50,
    learning_rate: float = 0.01,
    decay_coef: float = 0.5,
    max_ord_levels: int = 14,
    ensemble_weight: float = 0.5,
    l2_lambda: float = 0.001,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
window_sizeint200Gaussian Copula ECDF window size
buffer_sizeint200Density-Peak buffer size (instances retained for label propagation)
batch_sizeint50Batch size for EM updates
learning_ratefloat0.01SGD learning rate
decay_coeffloat0.5Covariance exponential decay factor
max_ord_levelsint14Ordinal detection threshold
ensemble_weightfloat0.5Initial ensemble weight
l2_lambdafloat0.001L2 regularization strength
random_seedint1RNG seed
Attributes
AttributeTypeDescription
_copulaGaussianCopulaGaussian Copula transformer for ECDF normalization
_density_peaksDensityPeakClusteringLabel propagation via density-peak structure
_w_obsndarray (d+1,)Observed-space weights with bias
_w_latndarray (d+1,)Latent-space weights with bias
_cont_indicesndarray boolContinuous feature mask
_ord_indicesndarray boolOrdinal feature mask
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneUpdates Copula window (per-instance); accumulates batch; when full: copula EM + density-peak label propagation + SGD on both classifiers
predict(instance: Instance) → intintReturns 1 if predict_proba()[1] > 0.5
predict_proba(instance: Instance) → ndarrayndarray (2,)Transforms to latent z, reconstructs missing features; ensemble prediction
Constraints: Binary classification only. Dense input. Distance matrix is O(buffer²) — large buffer_size increases compute cost.

§8   ORF3VClassifier

class ORF3VClassifier multi-class dense / sparse · src/openmoa/classifier/_orf3v_classifier.py

Full name: Online Random Feature Forests for Varying Feature Spaces  ·  Schreckenberger et al., AAAI 2023

ORF3V builds an independent decision stump forest for each feature. Per-feature forests predict class distributions, weighted by each feature's accuracy history. Handles dynamic feature spaces naturally — forests are initialized lazily after grace_period instances and new feature forests are created when new features appear. Stumps are periodically replaced by newly generated candidates with better Gini gain.

Constructor
ORF3VClassifier(
    schema: Schema,
    n_stumps: int = 10,
    alpha: float = 0.1,
    grace_period: int = 100,
    replacement_interval: int = 100,
    replacement_strategy: str = "oldest",
    window_size: int = 200,
    delta: float = 0.001,
    compression: float = 1000,
    d_max: int = 1000,
    enable_pruning: bool = False,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
n_stumpsint10Number of decision stumps per feature forest
alphafloat0.1Weight decay rate for per-forest accuracy updates
grace_periodint100Instances to observe before initializing forests
replacement_intervalint100Instances between stump replacement rounds
replacement_strategystr"oldest"Stump replacement strategy: "oldest" or "random"
window_sizeint200Sliding window for feature presence tracking
deltafloat0.001Hoeffding bound confidence parameter (used if enable_pruning=True)
compressionfloat1000t-digest compression parameter (reserved for future use)
d_maxint1000Maximum expected number of features
enable_pruningboolFalseEnable Hoeffding-bound-based feature pruning
random_seedint1RNG seed
Attributes
AttributeTypeDescription
feature_forestsdict{feature_id: FeatureForest} — one forest per observed feature
weightsdict{feature_id: float} — per-forest prediction weight
feature_statsdict{feature_id: FeatureStats} — per-feature statistics for stump generation
first_occurrencedict{feature_id: int} — time step of first appearance
tintInstance counter
n_classesintNumber of classes
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneUpdates FeatureStats; initializes forests after grace period; replaces stumps every replacement_interval; creates new forests for newly observed features; updates per-forest weights
predict(instance: Instance) → intintReturns class with highest aggregate score
predict_proba(instance: Instance) → ndarrayndarray (n_classes,)Normalizes aggregate class scores: score = Σ weight[f] · forest[f].predict(x[f])
Note: Supports multi-class. Dense or sparse input — uses instance.feature_indices attribute if present (set by OpenFeatureStream).

§9   OLD3SClassifier

class OLD3SClassifier binary / multi dense PyTorch · src/openmoa/classifier/_old3s_classifier.py

Full name: Online Learning Deep models from Data of Double Streams  ·  Lian et al., IEEE TKDE 2024

OLD3S supports lifelong learning across multiple sequential feature-space transitions. It maintains a Variational Autoencoder (VAE) for feature extraction and a Hedge Backpropagation (HBP) MLP for classification. A reactive state machine detects three phases — STABLE, OVERLAP, STABLE_NEW — and applies knowledge distillation during OVERLAP to align new and old latent spaces. Ensemble weights between current and previous models are updated exponentially based on prediction loss.

Constructor
OLD3SClassifier(
    schema: Schema,
    latent_dim: int = 20,
    hidden_dim: int = 128,
    num_hbp_layers: int = 3,
    learning_rate: float = 0.001,
    beta: float = 0.99,
    eta: float = 0.01,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
latent_dimint20VAE bottleneck dimensionality
hidden_dimint128Hidden units in VAE encoder/decoder and HBP MLP
num_hbp_layersint3Number of HBP MLP layers (each has its own exit classifier)
learning_ratefloat0.001Adam optimizer learning rate
betafloat0.99HBP weight decay rate — controls contribution of later layers
etafloat0.01Ensemble update rate — speed of weight shift between current and previous models
random_seedint1RNG seed
Attributes
AttributeTypeDescription
model_currdict | NoneCurrent model bundle: {vae, clf, opt_vae, opt_clf, hbp_weights, dim}
model_prevdict | NonePrevious model bundle (active during OVERLAP phase)
w_currfloatEnsemble weight for current model
w_prevfloatEnsemble weight for previous model
curr_indicesndarray | NoneFeature indices for current space
prev_indicesndarray | NoneFeature indices for previous space
is_overlapboolTrue during OVERLAP transition phase
devicetorch.device'cuda' if available, else 'cpu'
Internal model components
ComponentArchitecture
VAE_ShallowEncoder: Linear(d, hidden) + ReLU → μ, logvar; Linear(hidden, latent) · Decoder: Linear(latent, hidden) + ReLU + Linear(hidden, d) + Sigmoid
HBPMLPMulti-layer ReLU network; exit classifier at each layer; forward returns list of logits
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneDetects phase (STABLE / OVERLAP / STABLE_NEW); trains appropriate models; applies alignment loss during OVERLAP; updates ensemble weights
predict(instance: Instance) → intintReturns argmax of predict_proba()
predict_proba(instance: Instance) → ndarrayndarray (n_classes,)Encodes features via VAE → HBP prediction; ensemble with model_prev during OVERLAP
Phase state machine
StateTriggerBehavior
STABLENo previous model or stable feature setTrain model_curr only (VAE + classifier)
OVERLAPNew feature set is superset of currentmodel_prev ← model_curr; create new model_curr; train both; apply KD alignment loss
STABLE_NEWFeature set shrinks below prev+curr sizemodel_prev ← None; return to STABLE
Requires PyTorch. Binary and multi-class supported. Dense input only.

§10   OWSSClassifier

class OWSSClassifier multi-class dense PyTorch · src/openmoa/classifier/_owss_classifier.py

Full name: Online Learning from Open-World Soft Sensing  ·  Lian et al., ICDM 2024

OWSS uses a bipartite graph neural network (GNN) where nodes represent features and instances, and edges encode feature-instance relationships. Learnable feature embeddings capture universal feature representations shared across instances. A Feature Reconstruction Loss (Eq. 2) aligns the latent GNN output back toward the initial embedding, preventing representational drift. Updates are batch-based using a sliding window buffer.

Constructor
OWSSClassifier(
    schema: Schema,
    window_size: int = 100,
    hidden_dim: int = 32,
    learning_rate: float = 0.01,
    rec_weight: float = 0.1,
    sparsity_threshold: float = 0.05,
    random_seed: int = 1,
)
ParameterTypeDefaultDescription
schemaSchemarequiredStream schema
window_sizeint100Buffer size before each GNN update
hidden_dimint32Feature embedding dimensionality (GNN hidden size)
learning_ratefloat0.01Adam optimizer learning rate
rec_weightfloat0.1β: weight of reconstruction loss relative to classification loss
sparsity_thresholdfloat0.05Edge pruning threshold — feature-instance edges where |X_ij| < threshold are removed from the graph
random_seedint1RNG seed
Attributes
AttributeTypeDescription
modelOWSSNetworkBipartite GNN model
optimizeroptim.AdamModel optimizer
n_featuresintCurrent feature count
max_featuresintAllocated embedding pool size: max(1000, 2 · n_features)
devicetorch.device'cuda' or 'cpu'
stats_minndarray | NoneOnline per-feature minimum (for normalization)
stats_maxndarray | NoneOnline per-feature maximum
Internal model components
ComponentArchitecture
feature_embeddingsnn.Parameter shape (max_features, hidden_dim) — learnable universal feature representations
input_projectornn.Linear(1, hidden_dim) — projects scalar feature values to embedding space
BipartiteGraphConvΘ ∈ (hidden, hidden); forward: ReLU(adj @ x @ Θ) + dropout
classifierLinear(hidden, hidden//2) + ReLU + Linear(hidden//2, n_classes)
Methods
MethodSignatureReturnsDescription
train(instance: Instance) → NoneNoneAccumulates in buffer; when buffer ≥ window_size: normalizes, builds bipartite adjacency, forward pass, computes loss_cls + rec_weight · loss_rec, backpropagates
predict(instance: Instance) → intintReturns argmax of predict_proba()
predict_proba(instance: Instance) → ndarrayndarray (n_classes,)Normalizes input; builds adjacency; GNN forward pass; softmax
Loss function
L_total = CrossEntropyLoss(logits, y) + rec_weight · MSELoss(latent_z, initial_z.detach())

Where initial_z = instance nodes before GNN refinement, and latent_z = after. The reconstruction term (Eq. 2) prevents feature embeddings from drifting arbitrarily.

Requires PyTorch. Multi-class classification only. Dense input only.