Tutorial ~25 min read

Classification Algorithms for Evolving Feature Spaces

A systematic guide to OpenMOA's 10 specialized algorithms for Utilitarian Online Learning (UOL) — learning from data streams where the feature space itself changes over time.

OpenMOA introduces 10 classification algorithms specifically designed for Utilitarian Online Learning (UOL). While traditional online learning handles concept drift (distribution changes), UOL addresses the more complex challenge of feature evolution (features appearing and disappearing). These algorithms are the core contribution of OpenMOA, built on top of CapyMOA's existing library of traditional online classifiers.

1. Taxonomy & Reference

Algorithm Taxonomy

Category Algorithms Core Technique
Feature Space Mapping FESL Learns a linear mapping between old and new feature spaces
Copula-based Imputation OVFM, OSLMF Uses Gaussian Copula to model missing features statistically
Sparse Online Optimization OASF, RSOL, FOBOS, FTRL Online convex optimization with sparsity-inducing regularization
Deep / Ensemble Models OLD3S, ORF3V, OWSS Neural networks or ensemble methods for evolving spaces

Quick Reference

Algorithm Multi-class Sparse-Aware NaN-Aware Deep Learning Scalable (RCV1)
FESL No Yes No No No (OOM)
OVFM No No Yes No No (OOM)
OSLMF No No Yes No No (OOM)
OASF No Yes Yes No Yes
RSOL No Yes Yes No Yes
FOBOS Yes Yes Yes No Yes
FTRL Yes Yes Yes No Yes
OLD3S Yes Yes No Yes (PyTorch) Yes
ORF3V Yes Yes No No Yes
OWSS Yes No No Yes (PyTorch) Moderate

2. FESL — Feature Evolvable Streaming Learning

Feature Mapping

Background: FESL is the foundational algorithm for learning in evolving feature spaces, proposed at NeurIPS 2017. It addresses the scenario where the entire feature space shifts: the old set of features $S_{old}$ is gradually replaced by a new set $S_{new}$, with a brief overlap period where both are observable.

How It Works

FESL maintains two linear models and learns a mapping between them:

Key Equation: The mapping matrix $M$ is calculated as $(X_{new}^T X_{new} + \lambda I)^{-1} X_{new}^T X_{old}$.

Parameters

alpha float 0.1 Learning rate for SGD weight updates
lambda_ float 0.1 Ridge regularization strength for mapping matrix M
window_size int 100 Buffer size B to accumulate overlap instances for learning M

Usage

from openmoa.classifier import FESLClassifier

learner = FESLClassifier(
    schema=stream.get_schema(),
    alpha=0.1,
    lambda_=0.1,
    window_size=100
)
Per-Instance Training
O(d) SGD update
Mapping Learning
O(B·d² + d³)
Memory
O(d_old × d_new)
Limitation: The mapping matrix $M$ has size $d_{old} \times d_{new}$. For RCV1 (~47k features), this requires ~17 GB RAM, causing OOM on standard machines.

3. OVFM — Online Variable Feature Spaces with Mixed Data

Gaussian Copula

Background: OVFM addresses the challenge of learning when features randomly appear and disappear, and the data contains both continuous and ordinal types. It uses a Gaussian Copula model to statistically impute missing features, then trains an ensemble of two classifiers.

How It Works

Parameters

window_size int 200 Sliding window for marginal ECDF estimation
batch_size int 50 Mini-batch size for EM updates
decay_coef float 0.5 Exponential decay for Sigma update

Usage

from openmoa.classifier import OVFMClassifier

learner = OVFMClassifier(
    schema=stream.get_schema(),
    window_size=200,
    batch_size=50,
    max_ord_levels=14
)
Dependency: Requires statsmodels for Empirical CDF calculation.

4. OSLMF — Online Semi-supervised Learning

Copula + Semi-Supervised

Background: OSLMF extends the Copula-based approach of OVFM with semi-supervised learning capabilities. When labeled data is scarce, it uses Density Peak Clustering to propagate labels from labeled to unlabeled instances in the latent space.

Key Components

Usage

from openmoa.classifier import OSLMFClassifier

learner = OSLMFClassifier(
    schema=stream.get_schema(),
    window_size=200,
    buffer_size=200,
    ensemble_weight=0.5
)

5. OASF — Online Active Sparse Feature Learning

Sparse Opt

Background: OASF is designed for streams where features incrementally appear or disappear. It uses Passive-Aggressive (PA) updates with group sparsity regularization via the $L_{1,2}$-norm, automatically selecting important features over time.

How It Works

OASF uses a sliding window $W$ storing the last $L$ weight vectors. The $L_{1,2}$-norm operates on rows of $W$ (one row per feature), zeroing out features whose weights have been consistently weak across the window.

Parameters

lambda_param float 0.01 $L_{1,2}$-norm sparsity regularization
mu float 1.0 PA smoothness parameter
L int 100 Sliding window size

Usage

from openmoa.classifier import OASFClassifier

learner = OASFClassifier(
    schema=stream.get_schema(),
    lambda_param=0.01,
    mu=1.0,
    L=100
)

6. RSOL — Robust Sparse Online Learning

Sparse Opt + Optimized

Background: RSOL improves upon OASF with two key optimizations: a ring buffer for the sliding window (avoiding expensive array shifts) and auto-expanding weights for dynamic feature dimensions. This makes it significantly faster on large datasets.

Usage

from openmoa.classifier import RSOLClassifier

learner = RSOLClassifier(
    schema=stream.get_schema(),
    lambda_param=50.0,
    mu=1.0,
    L=1000
)
Optimization: The ring buffer replaces np.roll() with O(1) pointer advancement.

7. FOBOS — Forward-Backward Splitting

General Opt

Background: FOBOS separates the gradient descent step (forward) from the regularization step (backward proximal operator), enabling clean L1/L2/Group Lasso sparsity. Unlike FESL/OASF/RSOL, FOBOS supports multi-class classification.

How It Works

Usage

from openmoa.classifier import FOBOSClassifier

# Supports "l1", "l2", "l1_l2" regularization
learner = FOBOSClassifier(
    schema=stream.get_schema(),
    alpha=1.0,
    lambda_=0.001,
    regularization="l1",
    step_schedule="sqrt"
)

8. FTRL — Follow-The-Regularized-Leader

Industry Standard

Background: FTRL-Proximal is a state-of-the-art algorithm used widely in industry (e.g., ad click prediction). It achieves superior sparsity compared to SGD by accumulating gradient information and applying L1 thresholding in a principled manner.

Parameters

alpha float 0.1 Learning rate parameter
beta float 1.0 Smoothing parameter
l1 float 1.0 L1 regularization (controls sparsity)
l2 float 1.0 L2 regularization (controls smoothness)

Usage

from openmoa.classifier import FTRLClassifier

learner = FTRLClassifier(
    schema=stream.get_schema(),
    alpha=0.1,
    beta=1.0,
    l1=1.0,
    l2=1.0
)
print(f"Sparsity: {learner.get_sparsity():.1%}")

9. OLD3S — Online Learning from Data of Double Streams

Deep Learning

Background: OLD3S uses a Variational Autoencoder (VAE) to map each feature space into a shared latent space, and a Hedge Backpropagation (HBP) MLP for classification. When the feature space shifts, knowledge from the old model is distilled into the new one via latent-space alignment.

Usage

from openmoa.classifier import OLD3SClassifier

learner = OLD3SClassifier(
    schema=stream.get_schema(),
    latent_dim=20,
    hidden_dim=128,
    num_hbp_layers=3,
    learning_rate=0.001
)
Requirements: Requires torch installation. More computationally intensive than linear methods.

10. ORF3V — Online Random Feature Forests

Ensemble

Background: ORF3V builds independent decision stump ensembles ("feature forests") for each individual feature. This naturally handles varying feature spaces — when a feature is absent, its forest is simply excluded from the vote. No imputation is required.

Usage

from openmoa.classifier import ORF3VClassifier

learner = ORF3VClassifier(
    schema=stream.get_schema(),
    n_stumps=10,
    grace_period=100,
    replacement_interval=100
)

11. OWSS — Online World Soft Sensing

GNN

Background: OWSS uses a Graph Neural Network (GNN) approach with a bipartite graph structure connecting instances to features. It learns universal feature embeddings that persist across changing feature spaces.

Usage

from openmoa.classifier import OWSSClassifier

learner = OWSSClassifier(
    schema=stream.get_schema(),
    hidden_dim=32,
    learning_rate=0.01,
    rec_weight=0.1
)

12. Decision Guide

By Feature Evolution Pattern

Pattern Best Algorithms Why
EDS (Overlap shifts) FESL, OLD3S Designed for explicit $S_{old} \to S_{new}$ transitions
TDS (Growth) OVFM, OASF, FOBOS, FTRL Handle monotonically growing feature space
CDS (Random Missing) OVFM, OSLMF, ORF3V Imputation (OVFM) or independence (ORF3V)
General / Large Scale FOBOS, FTRL, RSOL Robust sparse methods that work everywhere

Complete Benchmark Example

benchmark.py
from openmoa.datasets import Electricity
from openmoa.stream import OpenFeatureStream
from openmoa.classifier import (
    FESLClassifier, OASFClassifier, RSOLClassifier,
    FOBOSClassifier, FTRLClassifier
)
from openmoa.evaluation import ClassificationEvaluator

base = Electricity()
schema = base.get_schema()

algorithms = {
    "FESL": FESLClassifier(schema=schema, alpha=0.1),
    "OASF": OASFClassifier(schema=schema, lambda_param=0.01),
    "RSOL": RSOLClassifier(schema=schema, lambda_param=50.0),
    "FOBOS": FOBOSClassifier(schema=schema, alpha=1.0, regularization="l1"),
    "FTRL": FTRLClassifier(schema=schema, alpha=0.1, l1=1.0, l2=1.0),
}

for name, learner in algorithms.items():
    base.restart()
    stream = OpenFeatureStream(
        base_stream=base,
        evolution_pattern="pyramid",
        d_min=2,
        total_instances=10000
    )
    evaluator = ClassificationEvaluator(schema=schema)
    
    while stream.has_more_instances():
        instance = stream.next_instance()
        prediction = learner.predict(instance)
        learner.train(instance)
        evaluator.update(instance.y_index, prediction)
        
    print(f"{name:8s} → Accuracy: {evaluator.accuracy():.2f}%")