Tutorial ~25 min read

Advanced Topics

This tutorial covers advanced OpenMOA capabilities: semi-supervised stream learning, online continual learning, integration with MOA's Java ecosystem, prediction intervals for regression, and techniques for high-performance evaluation.

1. Semi-Supervised Learning 2. Online Continual Learning 3. MOA Integration 4. Prediction Intervals 5. Performance Optimization

1. Semi-Supervised Learning

Semi-supervised stream learning handles the realistic case where labels arrive infrequently. OpenMOA's ClassifierSSL wraps any classifier to work with partially-labeled streams.

ClassifierSSL Interface

All SSL classifiers implement ClassifierSSL, which extends the standard Classifier interface with an is_labeled flag in learn_one.

ClassifierSSL interface

from openmoa.classifier import ClassifierSSL

ssl_clf = SomeSSLClassifier()

# Labeled instance
ssl_clf.learn_one(x, y, is_labeled=True)

# Unlabeled instance — only unsupervised update
ssl_clf.learn_one(x, y=None, is_labeled=False)

# Prediction is unchanged
y_pred = ssl_clf.predict_one(x)

prequential_ssl_evaluation

The dedicated SSL evaluator generates a variable labeling rate, measures accuracy only on labeled instances, and tracks unlabeled instance usage.

prequential_ssl_evaluation

from openmoa.evaluation import prequential_ssl_evaluation

results = prequential_ssl_evaluation(
    stream=stream,
    model=ssl_clf,
    labeling_rate=0.1,    # 10 % of instances are labeled
    n_samples=10000,
    window_size=1000,
)

Parameter	Type	Description
stream	Stream	Any OpenMOA stream object
model	ClassifierSSL	SSL-capable classifier
labeling_rate	float	Fraction of instances that carry labels (0–1)
n_samples	int	Total instances to process
window_size	int	Windowed accuracy reporting interval

OSNN — Online Semi-supervised Nearest Neighbor

OSNN maintains an online prototype graph. Labeled nodes propagate labels to nearby unlabeled neighbors via graph-based label propagation, enabling effective learning even at very low labeling rates.

OSNN example

from openmoa.classifier import OSNN

model = OSNN(
    n_neighbors=5,
    window_size=500,
)

results = prequential_ssl_evaluation(
    stream=stream,
    model=model,
    labeling_rate=0.05,   # only 5 % labeled
    n_samples=10000,
)

2. Online Continual Learning

Online Continual Learning (OCL) addresses the challenge of learning new tasks sequentially without forgetting previous knowledge. OpenMOA provides a complete OCL pipeline for both classic benchmarks and vision tasks.

Supported Datasets

Dataset	Class	Tasks	Notes
Split-MNIST	SplitMNIST	5 binary tasks	Standard OCL benchmark, digits 0–9 split by pairs
Split-CIFAR10	SplitCIFAR10	5 binary tasks	CIFAR-10 images split into 5 tasks
Split-CIFAR100	SplitCIFAR100	10 or 20 tasks	CIFAR-100 with configurable task count
ViT Features	ViTFeatureStream	Configurable	Pre-extracted Vision Transformer embeddings

Task Interface

Each OCL dataset exposes a task-based iteration interface. Tasks arrive sequentially; each task contains a stream of instances from that task's classes only.

task iteration

from openmoa.datasets import SplitMNIST

dataset = SplitMNIST()

for task_id, task_stream in dataset.tasks():
    print(f"Task {task_id}: classes = {task_stream.classes}")
    for x, y in task_stream:
        ...

ocl_train_eval_loop

The unified OCL training loop handles task boundaries, calls the strategy's before_task / after_task hooks, and accumulates the accuracy matrix.

ocl_train_eval_loop

from openmoa.evaluation import ocl_train_eval_loop
from openmoa.ocl import ExperienceReplay

strategy = ExperienceReplay(mem_size=200)
metrics = ocl_train_eval_loop(
    dataset=SplitMNIST(),
    strategy=strategy,
)

OCLMetrics

The OCLMetrics object returned by ocl_train_eval_loop provides the standard continual learning metrics derived from the accuracy matrix R, where R[i][j] is accuracy on task j after training on task i.

Metric	Property	Description
Accuracy Matrix	metrics.accuracy_matrix	Full R[i][j] matrix
Average Accuracy	metrics.avg_accuracy	Mean of final-row accuracies
Backward Transfer	metrics.backward_transfer	Average change in past-task accuracy after learning new tasks
Forward Transfer	metrics.forward_transfer	Average accuracy on future tasks before they are trained
Forgetting	metrics.forgetting	Average accuracy drop on previously-learned tasks
Anytime Accuracy	metrics.anytime_accuracy	Average accuracy over all tasks at every evaluation checkpoint

Strategies

Strategy	Class	Key Parameter	Description
Experience Replay	ExperienceReplay	mem_size	Stores a reservoir of past examples and interleaves them during training
SLDA	SLDA	shrinkage	Streaming Linear Discriminant Analysis; updates class means and shared covariance
NCM	NCM	n_neighbors	Nearest-Class-Mean classifier; classifies by nearest stored class centroid

comparing strategies

from openmoa.ocl import ExperienceReplay, SLDA, NCM

er   = ExperienceReplay(mem_size=200)
slda = SLDA(shrinkage=1e-4)
ncm  = NCM(n_neighbors=1)

for name, strategy in [("ER", er), ("SLDA", slda), ("NCM", ncm)]:
    m = ocl_train_eval_loop(SplitMNIST(), strategy)
    print(f"{name}: avg_acc={m.avg_accuracy:.3f}, forgetting={m.forgetting:.3f}")

3. MOA Integration

OpenMOA bridges the rich Java-based MOA (Massive Online Analysis) ecosystem through JPype. This gives access to hundreds of MOA classifiers and detectors without leaving Python.

MOAClassifier

MOAClassifier wraps any MOA learner using its CLI string syntax. It implements the standard OpenMOA Classifier interface so it drops into any evaluation loop.

MOAClassifier

from openmoa.classifier import MOAClassifier
from openmoa.evaluation import prequential_evaluation

clf = MOAClassifier(moa_learner="trees.HoeffdingTree -g 200 -c 0.0")

results = prequential_evaluation(stream, clf, n_samples=10000)

MOADriftDetector

MOADriftDetector wraps MOA drift detectors, bridging them into OpenMOA's DriftDetector interface.

MOADriftDetector

from openmoa.drift import MOADriftDetector

detector = MOADriftDetector(moa_detector="drift.ADWIN -d 0.002")

for x, y in stream:
    y_pred = clf.predict_one(x)
    detector.add_element(int(y_pred != y))
    if detector.detected_change():
        print("Drift detected!")

OpenFeatureStream

OpenFeatureStream simulates evolving feature spaces — a key characteristic of open-world streams where new sensors, modalities, or measurement types appear over time.

Evolution patterns

Pattern	Value	Description
Pyramid	"pyramid"	Features added then removed, forming a pyramid shape over time
Incremental	"incremental"	Features only added; feature space grows monotonically
Decremental	"decremental"	Features only removed; feature space shrinks monotonically
TDS	"tds"	Temporally-dependent sequence: features appear in correlated bursts
CDS	"cds"	Concept-dependent sequence: feature space tied to underlying concept
EDS	"eds"	Event-driven sequence: feature evolution triggered by stream events

Feature selection modes

Mode	Value	Description
Intersection	"intersection"	Use only features present in both training and current instance
Union	"union"	Use all known features; fill missing with zero / mean
New-only	"new_only"	Use only newly appeared features for the current window

OpenFeatureStream

from openmoa.stream import OpenFeatureStream

stream = OpenFeatureStream(
    base_stream="generators.SEAGenerator",
    evolution_pattern="pyramid",
    feature_selection="union",
    n_features_start=5,
    n_features_max=20,
    evolution_speed=500,   # features change every 500 instances
)

Lazy Java Loading

JPype and the MOA JVM are loaded on first use — not at import time. Importing OpenMOA is instant even without a JVM; Java initializes only when a MOAClassifier, MOADriftDetector, or OpenFeatureStream is first instantiated.

lazy JVM startup

import openmoa                          # instant — no JVM started

from openmoa.classifier import MOAClassifier
clf = MOAClassifier("trees.HoeffdingTree")  # JVM starts here

4. Prediction Intervals

For regression tasks, OpenMOA supports prediction intervals alongside point estimates. Regressors that implement interval prediction return a 3-tuple instead of a scalar.

Output Format

An interval-capable regressor's predict_one returns (lower, point, upper):

interval output

lower, point, upper = regressor.predict_one(x)
# lower  — lower confidence bound
# point  — point estimate (same as a standard regressor)
# upper  — upper confidence bound

Evaluators

Two evaluators handle interval predictions; both accept the 3-tuple output directly.

Class	Description
PredictionIntervalEvaluator	Cumulative evaluator — accumulates statistics over the full stream
PredictionIntervalWindowedEvaluator	Windowed evaluator — reports metrics over a sliding window of recent instances

interval evaluation loop

from openmoa.evaluation import PredictionIntervalWindowedEvaluator

evaluator = PredictionIntervalWindowedEvaluator(window_size=1000)

for x, y in stream:
    lower, point, upper = regressor.predict_one(x)
    evaluator.update(y, lower, point, upper)
    regressor.learn_one(x, y)

metrics = evaluator.get_metrics()

Interval Metrics

Metric	Key	Interpretation
Coverage (PICP)	metrics.picp	Fraction of true values falling within [lower, upper]; target = nominal level (e.g. 0.90)
Mean Interval Width	metrics.mean_width	Average width of prediction intervals; smaller = more precise
PINAW	metrics.pinaw	Interval width normalized by the target range
Point RMSE	metrics.point_rmse	RMSE of the point estimate only
Point MAE	metrics.point_mae	MAE of the point estimate only

5. Performance Optimization

OpenMOA provides several mechanisms to accelerate evaluation and training on large-scale streams.

Fast Evaluation Mode

Fast mode uses MOA's native EfficientEvaluationLoops — the entire prequential loop runs in Java, avoiding Python overhead per instance. It activates automatically when both conditions are met:

Condition	Requirement
Classifier	Must be a `MOAClassifier` (backed by a Java MOA learner)
Stream	Must be a `MOAStream` or any stream with a native MOA generator

fast mode — MOA + MOA

from openmoa.classifier import MOAClassifier
from openmoa.stream import MOAStream
from openmoa.evaluation import prequential_evaluation

clf    = MOAClassifier("trees.HoeffdingTree")
stream = MOAStream("generators.SEAGenerator -f 2")

# Both MOA — fast mode activates automatically
results = prequential_evaluation(
    stream=stream,
    model=clf,
    n_samples=1_000_000,
)

Batch Learning

BatchClassifier wraps any scikit-learn compatible classifier for stream evaluation. It collects instances into mini-batches before calling the underlying learner's partial_fit.

BatchClassifier

from openmoa.classifier import BatchClassifier
from sklearn.linear_model import SGDClassifier

clf = BatchClassifier(
    classifier=SGDClassifier(),
    batch_size=256,
)

results = prequential_evaluation(stream, clf, n_samples=50000)

Parameter	Type	Description
classifier	sklearn estimator	Any estimator with a `partial_fit` method
batch_size	int	Number of instances accumulated before calling `partial_fit`

Multi-Learner Single-Pass

When benchmarking multiple classifiers, pass a list of models to prequential_evaluation. The stream is iterated only once and all models are updated simultaneously — cutting total evaluation time roughly by the number of models.

multi-learner single-pass

from openmoa.classifier import HoeffdingTree, NaiveBayes, KNN

models = [HoeffdingTree(), NaiveBayes(), KNN(n_neighbors=5)]

# Single pass — all models evaluated in one stream iteration
results = prequential_evaluation(
    stream=stream,
    model=models,
    n_samples=100000,
    window_size=1000,
)

Performance Summary

Technique	When to Use	Typical Speedup
Fast Mode (MOA native loop)	MOAClassifier + MOAStream, >100 k instances	10–50×
BatchClassifier	sklearn learners needing mini-batch updates	2–5×
Multi-Learner Single-Pass	Benchmarking N models simultaneously	~N×

Previous: Evaluation Methods