Classification Algorithms for Evolving Feature Spaces
A systematic guide to OpenMOA's 10 specialized algorithms for Utilitarian Online Learning (UOL) — learning from data streams where the feature space itself changes over time.
OpenMOA introduces 10 classification algorithms specifically designed for Utilitarian Online Learning (UOL). While traditional online learning handles concept drift (distribution changes), UOL addresses the more complex challenge of feature evolution (features appearing and disappearing). These algorithms are the core contribution of OpenMOA, built on top of CapyMOA's existing library of traditional online classifiers.
1. Taxonomy & Reference
Algorithm Taxonomy
| Category | Algorithms | Core Technique |
|---|---|---|
| Feature Space Mapping | FESL |
Learns a linear mapping between old and new feature spaces |
| Copula-based Imputation | OVFM, OSLMF |
Uses Gaussian Copula to model missing features statistically |
| Sparse Online Optimization | OASF, RSOL, FOBOS, FTRL |
Online convex optimization with sparsity-inducing regularization |
| Deep / Ensemble Models | OLD3S, ORF3V, OWSS |
Neural networks or ensemble methods for evolving spaces |
Quick Reference
| Algorithm | Multi-class | Sparse-Aware | NaN-Aware | Deep Learning | Scalable (RCV1) |
|---|---|---|---|---|---|
| FESL | No | Yes | No | No | No (OOM) |
| OVFM | No | No | Yes | No | No (OOM) |
| OSLMF | No | No | Yes | No | No (OOM) |
| OASF | No | Yes | Yes | No | Yes |
| RSOL | No | Yes | Yes | No | Yes |
| FOBOS | Yes | Yes | Yes | No | Yes |
| FTRL | Yes | Yes | Yes | No | Yes |
| OLD3S | Yes | Yes | No | Yes (PyTorch) | Yes |
| ORF3V | Yes | Yes | No | No | Yes |
| OWSS | Yes | No | No | Yes (PyTorch) | Moderate |
2. FESL — Feature Evolvable Streaming Learning
Feature MappingBackground: FESL is the foundational algorithm for learning in evolving feature spaces, proposed at NeurIPS 2017. It addresses the scenario where the entire feature space shifts: the old set of features $S_{old}$ is gradually replaced by a new set $S_{new}$, with a brief overlap period where both are observable.
How It Works
FESL maintains two linear models and learns a mapping between them:
- Phase 1 (Stable): Only $S_{old}$ features visible → Train $w_{old}$.
- Phase 2 (Overlap): Both $S_{old}$ and $S_{new}$ visible → Train $w_{curr}$ on $S_{new}$ AND Learn mapping $M: S_{new} \to S_{old}$ via Ridge Regression.
- Phase 3 (New): Only $S_{new}$ features visible → Ensemble prediction: $y = \mu_1 \cdot f_{curr}(x) + \mu_2 \cdot f_{old}(M \cdot x)$.
Parameters
| alpha | float | 0.1 | Learning rate for SGD weight updates |
| lambda_ | float | 0.1 | Ridge regularization strength for mapping matrix M |
| window_size | int | 100 | Buffer size B to accumulate overlap instances for learning M |
Usage
from openmoa.classifier import FESLClassifier
learner = FESLClassifier(
schema=stream.get_schema(),
alpha=0.1,
lambda_=0.1,
window_size=100
)
3. OVFM — Online Variable Feature Spaces with Mixed Data
Gaussian CopulaBackground: OVFM addresses the challenge of learning when features randomly appear and disappear, and the data contains both continuous and ordinal types. It uses a Gaussian Copula model to statistically impute missing features, then trains an ensemble of two classifiers.
How It Works
- Detect Feature Types (Continuous vs Ordinal).
- Update Marginal Distributions via sliding window ECDF.
- EM Algorithm: Impute missing latent values $Z$ (E-step) and update correlation matrix $\Sigma$ (M-step).
- Train two classifiers: $w_{obs}$ on observed features and $w_{lat}$ on latent (copula-transformed) features.
Parameters
| window_size | int | 200 | Sliding window for marginal ECDF estimation |
| batch_size | int | 50 | Mini-batch size for EM updates |
| decay_coef | float | 0.5 | Exponential decay for Sigma update |
Usage
from openmoa.classifier import OVFMClassifier
learner = OVFMClassifier(
schema=stream.get_schema(),
window_size=200,
batch_size=50,
max_ord_levels=14
)
statsmodels for Empirical CDF calculation.
4. OSLMF — Online Semi-supervised Learning
Copula + Semi-SupervisedBackground: OSLMF extends the Copula-based approach of OVFM with semi-supervised learning capabilities. When labeled data is scarce, it uses Density Peak Clustering to propagate labels from labeled to unlabeled instances in the latent space.
Key Components
- Gaussian Copula: Handles mixed data types and missing features (same as OVFM).
- Density Peak Clustering: Propagates labels to unlabeled instances by finding nearest higher-density neighbors in the latent space.
- Dual Classifier Ensemble: Combines observed-space and latent-space predictions.
Usage
from openmoa.classifier import OSLMFClassifier
learner = OSLMFClassifier(
schema=stream.get_schema(),
window_size=200,
buffer_size=200,
ensemble_weight=0.5
)
5. OASF — Online Active Sparse Feature Learning
Sparse OptBackground: OASF is designed for streams where features incrementally appear or disappear. It uses Passive-Aggressive (PA) updates with group sparsity regularization via the $L_{1,2}$-norm, automatically selecting important features over time.
How It Works
OASF uses a sliding window $W$ storing the last $L$ weight vectors. The $L_{1,2}$-norm operates on rows of $W$ (one row per feature), zeroing out features whose weights have been consistently weak across the window.
Parameters
| lambda_param | float | 0.01 | $L_{1,2}$-norm sparsity regularization |
| mu | float | 1.0 | PA smoothness parameter |
| L | int | 100 | Sliding window size |
Usage
from openmoa.classifier import OASFClassifier
learner = OASFClassifier(
schema=stream.get_schema(),
lambda_param=0.01,
mu=1.0,
L=100
)
6. RSOL — Robust Sparse Online Learning
Sparse Opt + OptimizedBackground: RSOL improves upon OASF with two key optimizations: a ring buffer for the sliding window (avoiding expensive array shifts) and auto-expanding weights for dynamic feature dimensions. This makes it significantly faster on large datasets.
Usage
from openmoa.classifier import RSOLClassifier
learner = RSOLClassifier(
schema=stream.get_schema(),
lambda_param=50.0,
mu=1.0,
L=1000
)
np.roll() with O(1) pointer
advancement.
7. FOBOS — Forward-Backward Splitting
General OptBackground: FOBOS separates the gradient descent step (forward) from the regularization step (backward proximal operator), enabling clean L1/L2/Group Lasso sparsity. Unlike FESL/OASF/RSOL, FOBOS supports multi-class classification.
How It Works
- Step 1 (Forward): Standard Gradient Descent (Logistic or Softmax).
- Step 2 (Backward): Proximal Operator. For L1, this is soft thresholding: $w \leftarrow \text{sign}(w) \cdot \max(0, |w| - \eta\lambda)$.
Usage
from openmoa.classifier import FOBOSClassifier
# Supports "l1", "l2", "l1_l2" regularization
learner = FOBOSClassifier(
schema=stream.get_schema(),
alpha=1.0,
lambda_=0.001,
regularization="l1",
step_schedule="sqrt"
)
8. FTRL — Follow-The-Regularized-Leader
Industry StandardBackground: FTRL-Proximal is a state-of-the-art algorithm used widely in industry (e.g., ad click prediction). It achieves superior sparsity compared to SGD by accumulating gradient information and applying L1 thresholding in a principled manner.
Parameters
| alpha | float | 0.1 | Learning rate parameter |
| beta | float | 1.0 | Smoothing parameter |
| l1 | float | 1.0 | L1 regularization (controls sparsity) |
| l2 | float | 1.0 | L2 regularization (controls smoothness) |
Usage
from openmoa.classifier import FTRLClassifier
learner = FTRLClassifier(
schema=stream.get_schema(),
alpha=0.1,
beta=1.0,
l1=1.0,
l2=1.0
)
print(f"Sparsity: {learner.get_sparsity():.1%}")
9. OLD3S — Online Learning from Data of Double Streams
Deep LearningBackground: OLD3S uses a Variational Autoencoder (VAE) to map each feature space into a shared latent space, and a Hedge Backpropagation (HBP) MLP for classification. When the feature space shifts, knowledge from the old model is distilled into the new one via latent-space alignment.
Usage
from openmoa.classifier import OLD3SClassifier
learner = OLD3SClassifier(
schema=stream.get_schema(),
latent_dim=20,
hidden_dim=128,
num_hbp_layers=3,
learning_rate=0.001
)
torch installation. More computationally intensive
than linear methods.
10. ORF3V — Online Random Feature Forests
EnsembleBackground: ORF3V builds independent decision stump ensembles ("feature forests") for each individual feature. This naturally handles varying feature spaces — when a feature is absent, its forest is simply excluded from the vote. No imputation is required.
Usage
from openmoa.classifier import ORF3VClassifier
learner = ORF3VClassifier(
schema=stream.get_schema(),
n_stumps=10,
grace_period=100,
replacement_interval=100
)
11. OWSS — Online World Soft Sensing
GNNBackground: OWSS uses a Graph Neural Network (GNN) approach with a bipartite graph structure connecting instances to features. It learns universal feature embeddings that persist across changing feature spaces.
Usage
from openmoa.classifier import OWSSClassifier
learner = OWSSClassifier(
schema=stream.get_schema(),
hidden_dim=32,
learning_rate=0.01,
rec_weight=0.1
)
12. Decision Guide
By Feature Evolution Pattern
| Pattern | Best Algorithms | Why |
|---|---|---|
| EDS (Overlap shifts) | FESL, OLD3S | Designed for explicit $S_{old} \to S_{new}$ transitions |
| TDS (Growth) | OVFM, OASF, FOBOS, FTRL | Handle monotonically growing feature space |
| CDS (Random Missing) | OVFM, OSLMF, ORF3V | Imputation (OVFM) or independence (ORF3V) |
| General / Large Scale | FOBOS, FTRL, RSOL | Robust sparse methods that work everywhere |
Complete Benchmark Example
from openmoa.datasets import Electricity
from openmoa.stream import OpenFeatureStream
from openmoa.classifier import (
FESLClassifier, OASFClassifier, RSOLClassifier,
FOBOSClassifier, FTRLClassifier
)
from openmoa.evaluation import ClassificationEvaluator
base = Electricity()
schema = base.get_schema()
algorithms = {
"FESL": FESLClassifier(schema=schema, alpha=0.1),
"OASF": OASFClassifier(schema=schema, lambda_param=0.01),
"RSOL": RSOLClassifier(schema=schema, lambda_param=50.0),
"FOBOS": FOBOSClassifier(schema=schema, alpha=1.0, regularization="l1"),
"FTRL": FTRLClassifier(schema=schema, alpha=0.1, l1=1.0, l2=1.0),
}
for name, learner in algorithms.items():
base.restart()
stream = OpenFeatureStream(
base_stream=base,
evolution_pattern="pyramid",
d_min=2,
total_instances=10000
)
evaluator = ClassificationEvaluator(schema=schema)
while stream.has_more_instances():
instance = stream.next_instance()
prediction = learner.predict(instance)
learner.train(instance)
evaluator.update(instance.y_index, prediction)
print(f"{name:8s} → Accuracy: {evaluator.accuracy():.2f}%")