Tutorial ~25 min read

Classification Algorithms for Evolving Feature Spaces

A systematic guide to OpenMOA's 10 specialized algorithms for Utilitarian Online Learning (UOL) — learning from data streams where the feature space itself changes over time.

OpenMOA introduces 10 classification algorithms specifically designed for Utilitarian Online Learning (UOL). While traditional online learning handles concept drift (distribution changes), UOL addresses the more complex challenge of feature evolution (features appearing and disappearing). These algorithms are the core contribution of OpenMOA, built on top of CapyMOA's existing library of traditional online classifiers.

1. Taxonomy & Reference 2. FESL 3. OVFM 4. OSLMF 5. OASF 6. RSOL 7. FOBOS 8. FTRL 9. OLD3S (Deep) 10. ORF3V (Ensemble) 11. OWSS (GNN) 12. Decision Guide

1. Taxonomy & Reference

Algorithm Taxonomy

Category	Algorithms	Core Technique
Feature Space Mapping	`FESL`	Learns a linear mapping between old and new feature spaces
Copula-based Imputation	`OVFM`, `OSLMF`	Uses Gaussian Copula to model missing features statistically
Sparse Online Optimization	`OASF`, `RSOL`, `FOBOS`, `FTRL`	Online convex optimization with sparsity-inducing regularization
Deep / Ensemble Models	`OLD3S`, `ORF3V`, `OWSS`	Neural networks or ensemble methods for evolving spaces

Quick Reference

Algorithm	Multi-class	Sparse-Aware	NaN-Aware	Deep Learning	Scalable (RCV1)
FESL	No	Yes	No	No	No (OOM)
OVFM	No	No	Yes	No	No (OOM)
OSLMF	No	No	Yes	No	No (OOM)
OASF	No	Yes	Yes	No	Yes
RSOL	No	Yes	Yes	No	Yes
FOBOS	Yes	Yes	Yes	No	Yes
FTRL	Yes	Yes	Yes	No	Yes
OLD3S	Yes	Yes	No	Yes (PyTorch)	Yes
ORF3V	Yes	Yes	No	No	Yes
OWSS	Yes	No	No	Yes (PyTorch)	Moderate

2. FESL — Feature Evolvable Streaming Learning

Feature Mapping

Background: FESL is the foundational algorithm for learning in evolving feature spaces, proposed at NeurIPS 2017. It addresses the scenario where the entire feature space shifts: the old set of features $S_{old}$ is gradually replaced by a new set $S_{new}$, with a brief overlap period where both are observable.

How It Works

FESL maintains two linear models and learns a mapping between them:

Phase 1 (Stable): Only $S_{old}$ features visible → Train $w_{old}$.
Phase 2 (Overlap): Both $S_{old}$ and $S_{new}$ visible → Train $w_{curr}$ on $S_{new}$ AND Learn mapping $M: S_{new} \to S_{old}$ via Ridge Regression.
Phase 3 (New): Only $S_{new}$ features visible → Ensemble prediction: $y = \mu_1 \cdot f_{curr}(x) + \mu_2 \cdot f_{old}(M \cdot x)$.

Key Equation: The mapping matrix $M$ is calculated as $(X_{new}^T X_{new} + \lambda I)^{-1} X_{new}^T X_{old}$.

Parameters

alpha	float	0.1	Learning rate for SGD weight updates
lambda_	float	0.1	Ridge regularization strength for mapping matrix M
window_size	int	100	Buffer size B to accumulate overlap instances for learning M

Usage

from openmoa.classifier import FESLClassifier

learner = FESLClassifier(
    schema=stream.get_schema(),
    alpha=0.1,
    lambda_=0.1,
    window_size=100
)

Per-Instance Training

O(d) SGD update

Mapping Learning

O(B·d² + d³)

Memory

O(d_old × d_new)

Limitation: The mapping matrix $M$ has size $d_{old} \times d_{new}$. For RCV1 (~47k features), this requires ~17 GB RAM, causing OOM on standard machines.

3. OVFM — Online Variable Feature Spaces with Mixed Data

Gaussian Copula

Background: OVFM addresses the challenge of learning when features randomly appear and disappear, and the data contains both continuous and ordinal types. It uses a Gaussian Copula model to statistically impute missing features, then trains an ensemble of two classifiers.

How It Works

Detect Feature Types (Continuous vs Ordinal).
Update Marginal Distributions via sliding window ECDF.
EM Algorithm: Impute missing latent values $Z$ (E-step) and update correlation matrix $\Sigma$ (M-step).
Train two classifiers: $w_{obs}$ on observed features and $w_{lat}$ on latent (copula-transformed) features.

Parameters

window_size	int	200	Sliding window for marginal ECDF estimation
batch_size	int	50	Mini-batch size for EM updates
decay_coef	float	0.5	Exponential decay for Sigma update

Usage

from openmoa.classifier import OVFMClassifier

learner = OVFMClassifier(
    schema=stream.get_schema(),
    window_size=200,
    batch_size=50,
    max_ord_levels=14
)

Dependency: Requires statsmodels for Empirical CDF calculation.

4. OSLMF — Online Semi-supervised Learning

Copula + Semi-Supervised

Background: OSLMF extends the Copula-based approach of OVFM with semi-supervised learning capabilities. When labeled data is scarce, it uses Density Peak Clustering to propagate labels from labeled to unlabeled instances in the latent space.

Key Components

Gaussian Copula: Handles mixed data types and missing features (same as OVFM).
Density Peak Clustering: Propagates labels to unlabeled instances by finding nearest higher-density neighbors in the latent space.
Dual Classifier Ensemble: Combines observed-space and latent-space predictions.

Usage

from openmoa.classifier import OSLMFClassifier

learner = OSLMFClassifier(
    schema=stream.get_schema(),
    window_size=200,
    buffer_size=200,
    ensemble_weight=0.5
)

5. OASF — Online Active Sparse Feature Learning

Sparse Opt

Background: OASF is designed for streams where features incrementally appear or disappear. It uses Passive-Aggressive (PA) updates with group sparsity regularization via the $L_{1,2}$-norm, automatically selecting important features over time.

How It Works

OASF uses a sliding window $W$ storing the last $L$ weight vectors. The $L_{1,2}$-norm operates on rows of $W$ (one row per feature), zeroing out features whose weights have been consistently weak across the window.

Parameters

lambda_param	float	0.01	$L_{1,2}$-norm sparsity regularization
mu	float	1.0	PA smoothness parameter
L	int	100	Sliding window size

Usage

from openmoa.classifier import OASFClassifier

learner = OASFClassifier(
    schema=stream.get_schema(),
    lambda_param=0.01,
    mu=1.0,
    L=100
)

6. RSOL — Robust Sparse Online Learning

Sparse Opt + Optimized

Background: RSOL improves upon OASF with two key optimizations: a ring buffer for the sliding window (avoiding expensive array shifts) and auto-expanding weights for dynamic feature dimensions. This makes it significantly faster on large datasets.

Usage

from openmoa.classifier import RSOLClassifier

learner = RSOLClassifier(
    schema=stream.get_schema(),
    lambda_param=50.0,
    mu=1.0,
    L=1000
)

Optimization: The ring buffer replaces np.roll() with O(1) pointer advancement.

7. FOBOS — Forward-Backward Splitting

General Opt

Background: FOBOS separates the gradient descent step (forward) from the regularization step (backward proximal operator), enabling clean L1/L2/Group Lasso sparsity. Unlike FESL/OASF/RSOL, FOBOS supports multi-class classification.

How It Works

Step 1 (Forward): Standard Gradient Descent (Logistic or Softmax).
Step 2 (Backward): Proximal Operator. For L1, this is soft thresholding: $w \leftarrow \text{sign}(w) \cdot \max(0, |w| - \eta\lambda)$.

Usage

from openmoa.classifier import FOBOSClassifier

# Supports "l1", "l2", "l1_l2" regularization
learner = FOBOSClassifier(
    schema=stream.get_schema(),
    alpha=1.0,
    lambda_=0.001,
    regularization="l1",
    step_schedule="sqrt"
)

8. FTRL — Follow-The-Regularized-Leader

Industry Standard

Background: FTRL-Proximal is a state-of-the-art algorithm used widely in industry (e.g., ad click prediction). It achieves superior sparsity compared to SGD by accumulating gradient information and applying L1 thresholding in a principled manner.

Parameters

alpha	float	0.1	Learning rate parameter
beta	float	1.0	Smoothing parameter
l1	float	1.0	L1 regularization (controls sparsity)
l2	float	1.0	L2 regularization (controls smoothness)

Usage

from openmoa.classifier import FTRLClassifier

learner = FTRLClassifier(
    schema=stream.get_schema(),
    alpha=0.1,
    beta=1.0,
    l1=1.0,
    l2=1.0
)
print(f"Sparsity: {learner.get_sparsity():.1%}")

9. OLD3S — Online Learning from Data of Double Streams

Deep Learning

Background: OLD3S uses a Variational Autoencoder (VAE) to map each feature space into a shared latent space, and a Hedge Backpropagation (HBP) MLP for classification. When the feature space shifts, knowledge from the old model is distilled into the new one via latent-space alignment.

Usage

from openmoa.classifier import OLD3SClassifier

learner = OLD3SClassifier(
    schema=stream.get_schema(),
    latent_dim=20,
    hidden_dim=128,
    num_hbp_layers=3,
    learning_rate=0.001
)

Requirements: Requires torch installation. More computationally intensive than linear methods.

10. ORF3V — Online Random Feature Forests

Ensemble

Background: ORF3V builds independent decision stump ensembles ("feature forests") for each individual feature. This naturally handles varying feature spaces — when a feature is absent, its forest is simply excluded from the vote. No imputation is required.

Usage

from openmoa.classifier import ORF3VClassifier

learner = ORF3VClassifier(
    schema=stream.get_schema(),
    n_stumps=10,
    grace_period=100,
    replacement_interval=100
)

11. OWSS — Online World Soft Sensing

GNN

Background: OWSS uses a Graph Neural Network (GNN) approach with a bipartite graph structure connecting instances to features. It learns universal feature embeddings that persist across changing feature spaces.

Usage

from openmoa.classifier import OWSSClassifier

learner = OWSSClassifier(
    schema=stream.get_schema(),
    hidden_dim=32,
    learning_rate=0.01,
    rec_weight=0.1
)

12. Decision Guide

By Feature Evolution Pattern

Pattern	Best Algorithms	Why
EDS (Overlap shifts)	FESL, OLD3S	Designed for explicit $S_{old} \to S_{new}$ transitions
TDS (Growth)	OVFM, OASF, FOBOS, FTRL	Handle monotonically growing feature space
CDS (Random Missing)	OVFM, OSLMF, ORF3V	Imputation (OVFM) or independence (ORF3V)
General / Large Scale	FOBOS, FTRL, RSOL	Robust sparse methods that work everywhere

Complete Benchmark Example

benchmark.py

from openmoa.datasets import Electricity
from openmoa.stream import OpenFeatureStream
from openmoa.classifier import (
    FESLClassifier, OASFClassifier, RSOLClassifier,
    FOBOSClassifier, FTRLClassifier
)
from openmoa.evaluation import ClassificationEvaluator

base = Electricity()
schema = base.get_schema()

algorithms = {
    "FESL": FESLClassifier(schema=schema, alpha=0.1),
    "OASF": OASFClassifier(schema=schema, lambda_param=0.01),
    "RSOL": RSOLClassifier(schema=schema, lambda_param=50.0),
    "FOBOS": FOBOSClassifier(schema=schema, alpha=1.0, regularization="l1"),
    "FTRL": FTRLClassifier(schema=schema, alpha=0.1, l1=1.0, l2=1.0),
}

for name, learner in algorithms.items():
    base.restart()
    stream = OpenFeatureStream(
        base_stream=base,
        evolution_pattern="pyramid",
        d_min=2,
        total_instances=10000
    )
    evaluator = ClassificationEvaluator(schema=schema)
    
    while stream.has_more_instances():
        instance = stream.next_instance()
        prediction = learner.predict(instance)
        learner.train(instance)
        evaluator.update(instance.y_index, prediction)
        
    print(f"{name:8s} → Accuracy: {evaluator.accuracy():.2f}%")

Previous: Streams Next: Concept Drift