API Reference 10 Original Algorithms

openmoa.classifiers

Complete API reference for the 10 original classifiers introduced in OpenMOA. These algorithms address challenging real-world stream learning scenarios including dynamic feature spaces, sparse data, mixed data types, semi-supervised learning, and deep lifelong learning.

Quick Reference

overview table

Sparse Classifiers

FESLClassifier OASFClassifier RSOLClassifier FOBOSClassifier FTRLClassifier

Dense / Mixed Classifiers

OVFMClassifier OSLMFClassifier ORF3VClassifier

Deep Learning (PyTorch)

OLD3SClassifier OWSSClassifier

§0 Quick Overview

Classifier	Algorithm	Task	Input	PyTorch
FESLClassifier	Feature Evolvable SL NeurIPS 2017	Binary	Sparse	No
OASFClassifier	Online Active Sparse Features 2024	Binary	Sparse	No
RSOLClassifier	Robust Sparse Online Learning SDM 2024	Binary	Sparse	No
FOBOSClassifier	Forward-Backward Splitting JMLR 2009	Binary / Multi	Sparse	No
FTRLClassifier	Follow the Regularized Leader AISTATS 2011	Binary / Multi	Sparse	No
OVFMClassifier	Variable Feature Spaces, Mixed Data ICDM 2021	Binary	Dense	No
OSLMFClassifier	Semi-supervised Mixed Features AAAI 2023	Binary	Dense	No
ORF3VClassifier	Online Random Feature Forests AAAI 2023	Multi-class	Dense / Sparse	No
OLD3SClassifier	Deep Lifelong Learning TKDE 2024	Binary / Multi	Dense	Yes
OWSSClassifier	Open-World Soft Sensing ICDM 2024	Multi-class	Dense	Yes

CapyMOA classifiers: OpenMOA also exposes all CapyMOA classifiers (HoeffdingTree, NaiveBayes, KNN, SAMkNN, ARF, etc.) via the same interface. See the CapyMOA API Reference for their full documentation.

§1 FESLClassifier

class FESLClassifier binary sparse · src/openmoa/classifier/_fesl_classifier.py

Full name: Feature Evolvable Streaming Learning · Hou et al., NeurIPS 2017

FESL detects feature-space shifts by measuring the Jaccard distance between consecutive active feature index sets (threshold < 0.8). It maintains two sparse linear models (w_curr, w_old). During a transition, it accumulates instances in an overlap buffer and learns a linear mapping matrix M (via Ridge Regression) projecting new features onto old ones. Final prediction is a weighted ensemble: y = μ_curr · f_curr(x) + μ_old · f_old(M·x).

Constructor

signature

FESLClassifier(
    schema: Schema,
    alpha: float = 0.1,
    lambda_: float = 0.1,
    window_size: int = 100,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
alpha	float	0.1	Learning rate for logistic gradient updates
lambda_	float	0.1	L2 regularization strength for mapping matrix Ridge Regression
window_size	int	100	Overlap buffer size — instances to accumulate before learning the mapping matrix
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
w_curr	dict	Sparse current model weights `{feature_id: weight}`
w_old	dict	Sparse previous model weights
M_struct	dict \| None	Mapping matrix: `{'matrix': ndarray(D_new, D_old), 'new_map': {...}, 'old_ids': [...]}`
current_indices_set	set	Active feature indices in current stage
overlap_buffer	list	Buffered sparse feature dicts during transition
mu_curr	float	Ensemble weight for current model (initialized 0.5)
mu_old	float	Ensemble weight for old model (initialized 0.5)
t	int	Instance counter

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Detects feature shift via Jaccard similarity; buffers instances during overlap; learns M when buffer is full; updates w_curr via logistic gradient; updates ensemble weights via log-loss
predict	(instance: Instance) → int	int	Returns 0 or 1; threshold at predict_proba()[1] > 0.5
predict_proba	(instance: Instance) → ndarray	ndarray (2,)	Returns [P(y=0), P(y=1)] via logistic sigmoid of ensemble logit

Constraints: Binary classification only. Sparse input required. Computing M requires a dense (D_new × D_old) matrix — may cause MemoryError on very high-dimensional streams.

§2 OASFClassifier

class OASFClassifier binary sparse · src/openmoa/classifier/_oasf_classifier.py

Full name: Online Active Sparse Feature Learning · Chen et al., 2024

OASF uses Passive-Aggressive (PA) updates for incremental and decremental feature spaces (Theorems 1 & 2). It maintains a ring-buffer weight matrix W of size (n_features, L), applying L₁,₂-norm group sparsity shrinkage on each update. The ring buffer avoids expensive full-matrix copies. W auto-expands to handle arbitrarily high-dimensional streams.

Constructor

OASFClassifier(
    schema: Schema,
    lambda_param: float = 0.01,
    mu: float = 1.0,
    L: int = 100,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
lambda_param	float	0.01	L₁,₂ shrinkage threshold — controls sparsity level
mu	float	1.0	PA margin parameter — aggressiveness of weight updates
L	int	100	Sliding window length — number of weight vector columns
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
W	ndarray (n_features, L)	Ring buffer of weight vectors
_ptr	int	Ring-buffer write pointer (0 to L-1)
current_dim	int	Current feature dimensionality
t	int	Instance counter

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	PA update `w_new = w_s + γ·y·x_t`; applies L₁,₂ sparsity shrinkage; advances ring pointer; auto-expands W if needed
predict	(instance: Instance) → int	int	Returns 0 or 1
predict_proba	(instance: Instance) → ndarray	ndarray (2,)	Returns [P(y=0), P(y=1)] using latest weight column + logistic sigmoid
get_sparsity	() → float	float	Fraction of near-zero values in latest weight column; range [0, 1]

Constraints: Binary classification only. Sparse input required.

§3 RSOLClassifier

class RSOLClassifier binary sparse · src/openmoa/classifier/_rsol_classifier.py

Full name: Robust Sparse Online Learning · Chen et al., SDM 2024

RSOL shares the same ring-buffer and L₁,₂ sparsity architecture as OASF but uses a significantly larger default lambda_param (50.0 vs. 0.01) for stronger sparsity, and a larger default window L=1000. Designed for robustness to feature-space changes with emphasis on aggressive feature pruning.

Constructor

RSOLClassifier(
    schema: Schema,
    lambda_param: float = 50.0,
    mu: float = 1.0,
    L: int = 1000,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
lambda_param	float	50.0	L₁,₂ shrinkage threshold (much higher than OASF — stronger sparsity)
mu	float	1.0	PA margin parameter
L	int	1000	Sliding window length
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
W	ndarray (n_features, L)	Ring buffer of weight vectors
_ptr	int	Ring-buffer write pointer
current_dim	int	Current feature dimensionality
t	int	Instance counter

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	PA update; L₁,₂ sparsity shrinkage; auto-expands W
predict	(instance: Instance) → int	int	Returns 0 or 1
predict_proba	(instance: Instance) → ndarray	ndarray (2,)	Returns [P(y=0), P(y=1)]
get_sparsity	() → float	float	Fraction of near-zero weights in latest column

Constraints: Binary classification only. Sparse input required.

§4 FOBOSClassifier

class FOBOSClassifier binary / multi sparse · src/openmoa/classifier/_fobos_classifier.py

Full name: Forward-Backward Splitting · Duchi & Singer, JMLR 2009

FOBOS combines a forward gradient step with a proximal operator for online regularization. Supports L1 (sparse), L2 (ridge), L1+L2 (elastic net), or no regularization. Supports binary (logistic loss) and multi-class (softmax loss). Weight matrix auto-expands for dynamic feature streams. Learning rate decays as α/√t (sqrt schedule) or α/t (linear schedule).

Constructor

FOBOSClassifier(
    schema: Schema,
    alpha: float = 1.0,
    lambda_: float = 0.001,
    regularization: str = "l1",
    step_schedule: str = "sqrt",
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
alpha	float	1.0	Base learning rate
lambda_	float	0.001	Regularization strength
regularization	str	"l1"	Regularization type: `"l1"` (L1 soft-threshold), `"l2"` (scaled), `"l1_l2"` (group), `"none"`
step_schedule	str	"sqrt"	Learning rate decay: `"sqrt"` → η = α/√t; `"linear"` → η = α/t
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
W	ndarray (n_features, n_outputs)	Weight matrix; n_outputs=1 for binary, n_classes for multi-class
n_classes	int	Number of classes
task_type	str	"binary" or "multiclass"
n_features	int	Current feature count
t	int	Instance counter

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Gradient step (logistic/softmax); applies proximal operator; auto-expands W
predict	(instance: Instance) → int	int	Returns 0/1 (binary) or argmax (multi-class)
predict_proba	(instance: Instance) → ndarray	ndarray	Binary: (2,); Multi-class: (n_classes,) softmax probabilities

§5 FTRLClassifier

class FTRLClassifier binary / multi sparse · src/openmoa/classifier/_ftrl_classifier.py

Full name: Follow the Regularized Leader — Proximal · McMahan, AISTATS 2011

FTRL-Proximal maintains per-coordinate adaptive learning rates via three arrays: z (gradient accumulator), n (squared-gradient sum), and w (weights). The per-coordinate learning rate is (β + √n[i]) / α + L2, enabling features updated rarely to receive larger steps. L1 sparsification: features where |z[i]| ≤ L1 are zeroed. Arrays auto-expand for dynamic feature streams.

Constructor

FTRLClassifier(
    schema: Schema,
    alpha: float = 0.1,
    beta: float = 1.0,
    l1: float = 1.0,
    l2: float = 1.0,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
alpha	float	0.1	FTRL learning rate coefficient
beta	float	1.0	Per-coordinate adaptation term (smoothing)
l1	float	1.0	L1 threshold — features with \|z\| ≤ l1 are set to zero
l2	float	1.0	L2 regularization coefficient
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
z	ndarray (n_features, n_outputs)	Gradient accumulator (float64)
n	ndarray (n_features, n_outputs)	Squared-gradient sum (float64)
w	ndarray (n_features, n_outputs)	Current weight matrix (float64)
n_classes	int	Number of classes
task_type	str	"binary" or "multiclass"
n_features	int	Current feature count

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	FTRL core update: `n_new = n + grad²; σ = (√n_new − √n)/α; z_new = z + grad − σ·w`; proximal step: `w[i] = −(z[i]−sign(z[i])·l1)/((β+√n[i])/α+l2)` if `\|z[i]\| > l1` else 0
predict	(instance: Instance) → int	int	Binary: 1 if linear_pred > 0 else 0; Multi-class: argmax
predict_proba	(instance: Instance) → ndarray	ndarray	Binary: (2,); Multi-class: softmax (n_classes,)
get_sparsity	() → float	float	Fraction of near-zero weights (\|w\| < 1e-10)

§6 OVFMClassifier

class OVFMClassifier binary dense · src/openmoa/classifier/_ovfm_classifier.py

Full name: Online Learning in Variable Feature Spaces with Mixed Data · He et al., ICDM 2021

OVFM handles streams with mixed feature types (continuous + ordinal) and missing values caused by evolving feature spaces. It uses a Gaussian Copula EM algorithm to transform mixed data into a shared latent space, then trains dual classifiers: w_obs (observed space) and w_lat (latent space). Batch-based EM updates run every batch_size instances. The ensemble weight between the two classifiers is updated exponentially based on cumulative loss.

Constructor

OVFMClassifier(
    schema: Schema,
    window_size: int = 200,
    batch_size: int = 50,
    evolution_pattern: str = "vfs",
    decay_coef: float = 0.5,
    num_ord_updates: int = 2,
    max_ord_levels: int = 14,
    ensemble_weight: float = 0.5,
    learning_rate: float = 0.01,
    l1_lambda: float = 0.0,
    l2_lambda: float = 0.01,
    sparsity_threshold: float = 0.01,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
window_size	int	200	Gaussian Copula ECDF sliding window size
batch_size	int	50	Instances to accumulate before each EM + SGD update
evolution_pattern	str	"vfs"	Feature evolution type: `"vfs"`, `"tds"`, `"cds"`, `"eds"`
decay_coef	float	0.5	Exponential decay factor for online covariance update
num_ord_updates	int	2	Ordinal imputation iterations per E-step
max_ord_levels	int	14	Features with ≤ max_ord_levels unique values are treated as ordinal
ensemble_weight	float	0.5	Initial ensemble weight for observed-space classifier
learning_rate	float	0.01	SGD learning rate for both classifiers
l1_lambda	float	0.0	L1 regularization strength
l2_lambda	float	0.01	L2 regularization strength
sparsity_threshold	float	0.01	Weights below this are zeroed (sparsification)
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
_w_obs	ndarray (d+1,)	Observed-space weights with bias
_w_lat	ndarray (d+1,)	Latent-space weights with bias
_sigma	ndarray (d, d)	Correlation matrix from Gaussian Copula
_cont_indices	ndarray bool	Mask for continuous features
_ord_indices	ndarray bool	Mask for ordinal features
_transform_function	OnlineTransformFunction	Marginal ECDF/quantile helper

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Accumulates instances; every batch_size: E-step (impute latent Z) → M-step (update Σ) → SGD on both classifiers; updates feature type masks; updates ensemble weight; sparsifies weights
predict	(instance: Instance) → int	int	Returns 1 if predict_proba()[1] > 0.5
predict_proba	(instance: Instance) → ndarray	ndarray (2,)	Transforms x to latent z via copula; ensemble prediction: `w_ens·score_obs + (1−w_ens)·score_lat`; applies logistic sigmoid

Constraints: Binary classification only. Dense input. Covariance matrix is O(d²) — unsuitable for very high-dimensional streams.

§7 OSLMFClassifier

class OSLMFClassifier binary dense · src/openmoa/classifier/_oslmf_classifier.py

Full name: Online Semi-supervised Learning with Mix-Typed Streaming Features · Wu et al., AAAI 2023

OSLMF extends OVFM with semi-supervised label propagation. It uses a Gaussian Copula for mixed-data imputation and a Density Peak Clustering algorithm to propagate labels from labeled to unlabeled instances within the buffer. Dual classifiers (w_obs, w_lat) are trained with pseudo-labels from density-peak assignment.

Constructor

OSLMFClassifier(
    schema: Schema,
    window_size: int = 200,
    buffer_size: int = 200,
    batch_size: int = 50,
    learning_rate: float = 0.01,
    decay_coef: float = 0.5,
    max_ord_levels: int = 14,
    ensemble_weight: float = 0.5,
    l2_lambda: float = 0.001,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
window_size	int	200	Gaussian Copula ECDF window size
buffer_size	int	200	Density-Peak buffer size (instances retained for label propagation)
batch_size	int	50	Batch size for EM updates
learning_rate	float	0.01	SGD learning rate
decay_coef	float	0.5	Covariance exponential decay factor
max_ord_levels	int	14	Ordinal detection threshold
ensemble_weight	float	0.5	Initial ensemble weight
l2_lambda	float	0.001	L2 regularization strength
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
_copula	GaussianCopula	Gaussian Copula transformer for ECDF normalization
_density_peaks	DensityPeakClustering	Label propagation via density-peak structure
_w_obs	ndarray (d+1,)	Observed-space weights with bias
_w_lat	ndarray (d+1,)	Latent-space weights with bias
_cont_indices	ndarray bool	Continuous feature mask
_ord_indices	ndarray bool	Ordinal feature mask

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Updates Copula window (per-instance); accumulates batch; when full: copula EM + density-peak label propagation + SGD on both classifiers
predict	(instance: Instance) → int	int	Returns 1 if predict_proba()[1] > 0.5
predict_proba	(instance: Instance) → ndarray	ndarray (2,)	Transforms to latent z, reconstructs missing features; ensemble prediction

Constraints: Binary classification only. Dense input. Distance matrix is O(buffer²) — large buffer_size increases compute cost.

§8 ORF3VClassifier

class ORF3VClassifier multi-class dense / sparse · src/openmoa/classifier/_orf3v_classifier.py

Full name: Online Random Feature Forests for Varying Feature Spaces · Schreckenberger et al., AAAI 2023

ORF3V builds an independent decision stump forest for each feature. Per-feature forests predict class distributions, weighted by each feature's accuracy history. Handles dynamic feature spaces naturally — forests are initialized lazily after grace_period instances and new feature forests are created when new features appear. Stumps are periodically replaced by newly generated candidates with better Gini gain.

Constructor

ORF3VClassifier(
    schema: Schema,
    n_stumps: int = 10,
    alpha: float = 0.1,
    grace_period: int = 100,
    replacement_interval: int = 100,
    replacement_strategy: str = "oldest",
    window_size: int = 200,
    delta: float = 0.001,
    compression: float = 1000,
    d_max: int = 1000,
    enable_pruning: bool = False,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
n_stumps	int	10	Number of decision stumps per feature forest
alpha	float	0.1	Weight decay rate for per-forest accuracy updates
grace_period	int	100	Instances to observe before initializing forests
replacement_interval	int	100	Instances between stump replacement rounds
replacement_strategy	str	"oldest"	Stump replacement strategy: `"oldest"` or `"random"`
window_size	int	200	Sliding window for feature presence tracking
delta	float	0.001	Hoeffding bound confidence parameter (used if enable_pruning=True)
compression	float	1000	t-digest compression parameter (reserved for future use)
d_max	int	1000	Maximum expected number of features
enable_pruning	bool	False	Enable Hoeffding-bound-based feature pruning
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
feature_forests	dict	`{feature_id: FeatureForest}` — one forest per observed feature
weights	dict	`{feature_id: float}` — per-forest prediction weight
feature_stats	dict	`{feature_id: FeatureStats}` — per-feature statistics for stump generation
first_occurrence	dict	`{feature_id: int}` — time step of first appearance
t	int	Instance counter
n_classes	int	Number of classes

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Updates FeatureStats; initializes forests after grace period; replaces stumps every replacement_interval; creates new forests for newly observed features; updates per-forest weights
predict	(instance: Instance) → int	int	Returns class with highest aggregate score
predict_proba	(instance: Instance) → ndarray	ndarray (n_classes,)	Normalizes aggregate class scores: `score = Σ weight[f] · forest[f].predict(x[f])`

Note: Supports multi-class. Dense or sparse input — uses instance.feature_indices attribute if present (set by OpenFeatureStream).

§9 OLD3SClassifier

class OLD3SClassifier binary / multi dense PyTorch · src/openmoa/classifier/_old3s_classifier.py

Full name: Online Learning Deep models from Data of Double Streams · Lian et al., IEEE TKDE 2024

OLD3S supports lifelong learning across multiple sequential feature-space transitions. It maintains a Variational Autoencoder (VAE) for feature extraction and a Hedge Backpropagation (HBP) MLP for classification. A reactive state machine detects three phases — STABLE, OVERLAP, STABLE_NEW — and applies knowledge distillation during OVERLAP to align new and old latent spaces. Ensemble weights between current and previous models are updated exponentially based on prediction loss.

Constructor

OLD3SClassifier(
    schema: Schema,
    latent_dim: int = 20,
    hidden_dim: int = 128,
    num_hbp_layers: int = 3,
    learning_rate: float = 0.001,
    beta: float = 0.99,
    eta: float = 0.01,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
latent_dim	int	20	VAE bottleneck dimensionality
hidden_dim	int	128	Hidden units in VAE encoder/decoder and HBP MLP
num_hbp_layers	int	3	Number of HBP MLP layers (each has its own exit classifier)
learning_rate	float	0.001	Adam optimizer learning rate
beta	float	0.99	HBP weight decay rate — controls contribution of later layers
eta	float	0.01	Ensemble update rate — speed of weight shift between current and previous models
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
model_curr	dict \| None	Current model bundle: `{vae, clf, opt_vae, opt_clf, hbp_weights, dim}`
model_prev	dict \| None	Previous model bundle (active during OVERLAP phase)
w_curr	float	Ensemble weight for current model
w_prev	float	Ensemble weight for previous model
curr_indices	ndarray \| None	Feature indices for current space
prev_indices	ndarray \| None	Feature indices for previous space
is_overlap	bool	True during OVERLAP transition phase
device	torch.device	'cuda' if available, else 'cpu'

Internal model components

Component	Architecture
VAE_Shallow	Encoder: Linear(d, hidden) + ReLU → μ, logvar; Linear(hidden, latent) · Decoder: Linear(latent, hidden) + ReLU + Linear(hidden, d) + Sigmoid
HBPMLP	Multi-layer ReLU network; exit classifier at each layer; forward returns list of logits

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Detects phase (STABLE / OVERLAP / STABLE_NEW); trains appropriate models; applies alignment loss during OVERLAP; updates ensemble weights
predict	(instance: Instance) → int	int	Returns argmax of predict_proba()
predict_proba	(instance: Instance) → ndarray	ndarray (n_classes,)	Encodes features via VAE → HBP prediction; ensemble with model_prev during OVERLAP

Phase state machine

State	Trigger	Behavior
STABLE	No previous model or stable feature set	Train model_curr only (VAE + classifier)
OVERLAP	New feature set is superset of current	model_prev ← model_curr; create new model_curr; train both; apply KD alignment loss
STABLE_NEW	Feature set shrinks below prev+curr size	model_prev ← None; return to STABLE

Requires PyTorch. Binary and multi-class supported. Dense input only.

§10 OWSSClassifier

class OWSSClassifier multi-class dense PyTorch · src/openmoa/classifier/_owss_classifier.py

Full name: Online Learning from Open-World Soft Sensing · Lian et al., ICDM 2024

OWSS uses a bipartite graph neural network (GNN) where nodes represent features and instances, and edges encode feature-instance relationships. Learnable feature embeddings capture universal feature representations shared across instances. A Feature Reconstruction Loss (Eq. 2) aligns the latent GNN output back toward the initial embedding, preventing representational drift. Updates are batch-based using a sliding window buffer.

Constructor

OWSSClassifier(
    schema: Schema,
    window_size: int = 100,
    hidden_dim: int = 32,
    learning_rate: float = 0.01,
    rec_weight: float = 0.1,
    sparsity_threshold: float = 0.05,
    random_seed: int = 1,
)

Parameter	Type	Default	Description
schema	Schema	required	Stream schema
window_size	int	100	Buffer size before each GNN update
hidden_dim	int	32	Feature embedding dimensionality (GNN hidden size)
learning_rate	float	0.01	Adam optimizer learning rate
rec_weight	float	0.1	β: weight of reconstruction loss relative to classification loss
sparsity_threshold	float	0.05	Edge pruning threshold — feature-instance edges where \|X_ij\| < threshold are removed from the graph
random_seed	int	1	RNG seed

Attributes

Attribute	Type	Description
model	OWSSNetwork	Bipartite GNN model
optimizer	optim.Adam	Model optimizer
n_features	int	Current feature count
max_features	int	Allocated embedding pool size: `max(1000, 2 · n_features)`
device	torch.device	'cuda' or 'cpu'
stats_min	ndarray \| None	Online per-feature minimum (for normalization)
stats_max	ndarray \| None	Online per-feature maximum

Internal model components

Component	Architecture
feature_embeddings	nn.Parameter shape (max_features, hidden_dim) — learnable universal feature representations
input_projector	nn.Linear(1, hidden_dim) — projects scalar feature values to embedding space
BipartiteGraphConv	Θ ∈ (hidden, hidden); forward: ReLU(adj @ x @ Θ) + dropout
classifier	Linear(hidden, hidden//2) + ReLU + Linear(hidden//2, n_classes)

Methods

Method	Signature	Returns	Description
train	(instance: Instance) → None	None	Accumulates in buffer; when buffer ≥ window_size: normalizes, builds bipartite adjacency, forward pass, computes `loss_cls + rec_weight · loss_rec`, backpropagates
predict	(instance: Instance) → int	int	Returns argmax of predict_proba()
predict_proba	(instance: Instance) → ndarray	ndarray (n_classes,)	Normalizes input; builds adjacency; GNN forward pass; softmax

Loss function

L_total = CrossEntropyLoss(logits, y) + rec_weight · MSELoss(latent_z, initial_z.detach())

Where initial_z = instance nodes before GNN refinement, and latent_z = after. The reconstruction term (Eq. 2) prevents feature embeddings from drifting arbitrarily.

Requires PyTorch. Multi-class classification only. Dense input only.

Previous: openmoa.streams Next: openmoa.regressors