Tutorial ~25 min read

Advanced Topics

This tutorial covers advanced OpenMOA capabilities: semi-supervised stream learning, online continual learning, integration with MOA's Java ecosystem, prediction intervals for regression, and techniques for high-performance evaluation.

1. Semi-Supervised Learning

Semi-supervised stream learning handles the realistic case where labels arrive infrequently. OpenMOA's ClassifierSSL wraps any classifier to work with partially-labeled streams.

ClassifierSSL Interface

All SSL classifiers implement ClassifierSSL, which extends the standard Classifier interface with an is_labeled flag in learn_one.

ClassifierSSL interface
from openmoa.classifier import ClassifierSSL

ssl_clf = SomeSSLClassifier()

# Labeled instance
ssl_clf.learn_one(x, y, is_labeled=True)

# Unlabeled instance — only unsupervised update
ssl_clf.learn_one(x, y=None, is_labeled=False)

# Prediction is unchanged
y_pred = ssl_clf.predict_one(x)

prequential_ssl_evaluation

The dedicated SSL evaluator generates a variable labeling rate, measures accuracy only on labeled instances, and tracks unlabeled instance usage.

prequential_ssl_evaluation
from openmoa.evaluation import prequential_ssl_evaluation

results = prequential_ssl_evaluation(
    stream=stream,
    model=ssl_clf,
    labeling_rate=0.1,    # 10 % of instances are labeled
    n_samples=10000,
    window_size=1000,
)
ParameterTypeDescription
streamStreamAny OpenMOA stream object
modelClassifierSSLSSL-capable classifier
labeling_ratefloatFraction of instances that carry labels (0–1)
n_samplesintTotal instances to process
window_sizeintWindowed accuracy reporting interval

OSNN — Online Semi-supervised Nearest Neighbor

OSNN maintains an online prototype graph. Labeled nodes propagate labels to nearby unlabeled neighbors via graph-based label propagation, enabling effective learning even at very low labeling rates.

OSNN example
from openmoa.classifier import OSNN

model = OSNN(
    n_neighbors=5,
    window_size=500,
)

results = prequential_ssl_evaluation(
    stream=stream,
    model=model,
    labeling_rate=0.05,   # only 5 % labeled
    n_samples=10000,
)

2. Online Continual Learning

Online Continual Learning (OCL) addresses the challenge of learning new tasks sequentially without forgetting previous knowledge. OpenMOA provides a complete OCL pipeline for both classic benchmarks and vision tasks.

Supported Datasets

DatasetClassTasksNotes
Split-MNIST SplitMNIST 5 binary tasks Standard OCL benchmark, digits 0–9 split by pairs
Split-CIFAR10 SplitCIFAR10 5 binary tasks CIFAR-10 images split into 5 tasks
Split-CIFAR100 SplitCIFAR100 10 or 20 tasks CIFAR-100 with configurable task count
ViT Features ViTFeatureStream Configurable Pre-extracted Vision Transformer embeddings

Task Interface

Each OCL dataset exposes a task-based iteration interface. Tasks arrive sequentially; each task contains a stream of instances from that task's classes only.

task iteration
from openmoa.datasets import SplitMNIST

dataset = SplitMNIST()

for task_id, task_stream in dataset.tasks():
    print(f"Task {task_id}: classes = {task_stream.classes}")
    for x, y in task_stream:
        ...

ocl_train_eval_loop

The unified OCL training loop handles task boundaries, calls the strategy's before_task / after_task hooks, and accumulates the accuracy matrix.

ocl_train_eval_loop
from openmoa.evaluation import ocl_train_eval_loop
from openmoa.ocl import ExperienceReplay

strategy = ExperienceReplay(mem_size=200)
metrics = ocl_train_eval_loop(
    dataset=SplitMNIST(),
    strategy=strategy,
)

OCLMetrics

The OCLMetrics object returned by ocl_train_eval_loop provides the standard continual learning metrics derived from the accuracy matrix R, where R[i][j] is accuracy on task j after training on task i.

MetricPropertyDescription
Accuracy Matrixmetrics.accuracy_matrixFull R[i][j] matrix
Average Accuracymetrics.avg_accuracyMean of final-row accuracies
Backward Transfermetrics.backward_transferAverage change in past-task accuracy after learning new tasks
Forward Transfermetrics.forward_transferAverage accuracy on future tasks before they are trained
Forgettingmetrics.forgettingAverage accuracy drop on previously-learned tasks
Anytime Accuracymetrics.anytime_accuracyAverage accuracy over all tasks at every evaluation checkpoint

Strategies

StrategyClassKey ParameterDescription
Experience Replay ExperienceReplay mem_size Stores a reservoir of past examples and interleaves them during training
SLDA SLDA shrinkage Streaming Linear Discriminant Analysis; updates class means and shared covariance
NCM NCM n_neighbors Nearest-Class-Mean classifier; classifies by nearest stored class centroid
comparing strategies
from openmoa.ocl import ExperienceReplay, SLDA, NCM

er   = ExperienceReplay(mem_size=200)
slda = SLDA(shrinkage=1e-4)
ncm  = NCM(n_neighbors=1)

for name, strategy in [("ER", er), ("SLDA", slda), ("NCM", ncm)]:
    m = ocl_train_eval_loop(SplitMNIST(), strategy)
    print(f"{name}: avg_acc={m.avg_accuracy:.3f}, forgetting={m.forgetting:.3f}")

3. MOA Integration

OpenMOA bridges the rich Java-based MOA (Massive Online Analysis) ecosystem through JPype. This gives access to hundreds of MOA classifiers and detectors without leaving Python.

MOAClassifier

MOAClassifier wraps any MOA learner using its CLI string syntax. It implements the standard OpenMOA Classifier interface so it drops into any evaluation loop.

MOAClassifier
from openmoa.classifier import MOAClassifier
from openmoa.evaluation import prequential_evaluation

clf = MOAClassifier(moa_learner="trees.HoeffdingTree -g 200 -c 0.0")

results = prequential_evaluation(stream, clf, n_samples=10000)

MOADriftDetector

MOADriftDetector wraps MOA drift detectors, bridging them into OpenMOA's DriftDetector interface.

MOADriftDetector
from openmoa.drift import MOADriftDetector

detector = MOADriftDetector(moa_detector="drift.ADWIN -d 0.002")

for x, y in stream:
    y_pred = clf.predict_one(x)
    detector.add_element(int(y_pred != y))
    if detector.detected_change():
        print("Drift detected!")

OpenFeatureStream

OpenFeatureStream simulates evolving feature spaces — a key characteristic of open-world streams where new sensors, modalities, or measurement types appear over time.

Evolution patterns

PatternValueDescription
Pyramid"pyramid"Features added then removed, forming a pyramid shape over time
Incremental"incremental"Features only added; feature space grows monotonically
Decremental"decremental"Features only removed; feature space shrinks monotonically
TDS"tds"Temporally-dependent sequence: features appear in correlated bursts
CDS"cds"Concept-dependent sequence: feature space tied to underlying concept
EDS"eds"Event-driven sequence: feature evolution triggered by stream events

Feature selection modes

ModeValueDescription
Intersection"intersection"Use only features present in both training and current instance
Union"union"Use all known features; fill missing with zero / mean
New-only"new_only"Use only newly appeared features for the current window
OpenFeatureStream
from openmoa.stream import OpenFeatureStream

stream = OpenFeatureStream(
    base_stream="generators.SEAGenerator",
    evolution_pattern="pyramid",
    feature_selection="union",
    n_features_start=5,
    n_features_max=20,
    evolution_speed=500,   # features change every 500 instances
)

Lazy Java Loading

JPype and the MOA JVM are loaded on first use — not at import time. Importing OpenMOA is instant even without a JVM; Java initializes only when a MOAClassifier, MOADriftDetector, or OpenFeatureStream is first instantiated.

lazy JVM startup
import openmoa                          # instant — no JVM started

from openmoa.classifier import MOAClassifier
clf = MOAClassifier("trees.HoeffdingTree")  # JVM starts here

4. Prediction Intervals

For regression tasks, OpenMOA supports prediction intervals alongside point estimates. Regressors that implement interval prediction return a 3-tuple instead of a scalar.

Output Format

An interval-capable regressor's predict_one returns (lower, point, upper):

interval output
lower, point, upper = regressor.predict_one(x)
# lower  — lower confidence bound
# point  — point estimate (same as a standard regressor)
# upper  — upper confidence bound

Evaluators

Two evaluators handle interval predictions; both accept the 3-tuple output directly.

ClassDescription
PredictionIntervalEvaluator Cumulative evaluator — accumulates statistics over the full stream
PredictionIntervalWindowedEvaluator Windowed evaluator — reports metrics over a sliding window of recent instances
interval evaluation loop
from openmoa.evaluation import PredictionIntervalWindowedEvaluator

evaluator = PredictionIntervalWindowedEvaluator(window_size=1000)

for x, y in stream:
    lower, point, upper = regressor.predict_one(x)
    evaluator.update(y, lower, point, upper)
    regressor.learn_one(x, y)

metrics = evaluator.get_metrics()

Interval Metrics

MetricKeyInterpretation
Coverage (PICP)metrics.picpFraction of true values falling within [lower, upper]; target = nominal level (e.g. 0.90)
Mean Interval Widthmetrics.mean_widthAverage width of prediction intervals; smaller = more precise
PINAWmetrics.pinawInterval width normalized by the target range
Point RMSEmetrics.point_rmseRMSE of the point estimate only
Point MAEmetrics.point_maeMAE of the point estimate only

5. Performance Optimization

OpenMOA provides several mechanisms to accelerate evaluation and training on large-scale streams.

Fast Evaluation Mode

Fast mode uses MOA's native EfficientEvaluationLoops — the entire prequential loop runs in Java, avoiding Python overhead per instance. It activates automatically when both conditions are met:

ConditionRequirement
ClassifierMust be a MOAClassifier (backed by a Java MOA learner)
StreamMust be a MOAStream or any stream with a native MOA generator
fast mode — MOA + MOA
from openmoa.classifier import MOAClassifier
from openmoa.stream import MOAStream
from openmoa.evaluation import prequential_evaluation

clf    = MOAClassifier("trees.HoeffdingTree")
stream = MOAStream("generators.SEAGenerator -f 2")

# Both MOA — fast mode activates automatically
results = prequential_evaluation(
    stream=stream,
    model=clf,
    n_samples=1_000_000,
)

Batch Learning

BatchClassifier wraps any scikit-learn compatible classifier for stream evaluation. It collects instances into mini-batches before calling the underlying learner's partial_fit.

BatchClassifier
from openmoa.classifier import BatchClassifier
from sklearn.linear_model import SGDClassifier

clf = BatchClassifier(
    classifier=SGDClassifier(),
    batch_size=256,
)

results = prequential_evaluation(stream, clf, n_samples=50000)
ParameterTypeDescription
classifiersklearn estimatorAny estimator with a partial_fit method
batch_sizeintNumber of instances accumulated before calling partial_fit

Multi-Learner Single-Pass

When benchmarking multiple classifiers, pass a list of models to prequential_evaluation. The stream is iterated only once and all models are updated simultaneously — cutting total evaluation time roughly by the number of models.

multi-learner single-pass
from openmoa.classifier import HoeffdingTree, NaiveBayes, KNN

models = [HoeffdingTree(), NaiveBayes(), KNN(n_neighbors=5)]

# Single pass — all models evaluated in one stream iteration
results = prequential_evaluation(
    stream=stream,
    model=models,
    n_samples=100000,
    window_size=1000,
)

Performance Summary

TechniqueWhen to UseTypical Speedup
Fast Mode (MOA native loop) MOAClassifier + MOAStream, >100 k instances 10–50×
BatchClassifier sklearn learners needing mini-batch updates 2–5×
Multi-Learner Single-Pass Benchmarking N models simultaneously ~N×