Advanced Topics
This tutorial covers advanced OpenMOA capabilities: semi-supervised stream learning, online continual learning, integration with MOA's Java ecosystem, prediction intervals for regression, and techniques for high-performance evaluation.
1. Semi-Supervised Learning
Semi-supervised stream learning handles the realistic case where labels arrive infrequently.
OpenMOA's ClassifierSSL wraps any classifier to work with partially-labeled streams.
ClassifierSSL Interface
All SSL classifiers implement ClassifierSSL, which extends the standard
Classifier interface with an is_labeled flag in learn_one.
from openmoa.classifier import ClassifierSSL
ssl_clf = SomeSSLClassifier()
# Labeled instance
ssl_clf.learn_one(x, y, is_labeled=True)
# Unlabeled instance — only unsupervised update
ssl_clf.learn_one(x, y=None, is_labeled=False)
# Prediction is unchanged
y_pred = ssl_clf.predict_one(x)
prequential_ssl_evaluation
The dedicated SSL evaluator generates a variable labeling rate, measures accuracy only on labeled instances, and tracks unlabeled instance usage.
from openmoa.evaluation import prequential_ssl_evaluation
results = prequential_ssl_evaluation(
stream=stream,
model=ssl_clf,
labeling_rate=0.1, # 10 % of instances are labeled
n_samples=10000,
window_size=1000,
)
| Parameter | Type | Description |
|---|---|---|
| stream | Stream | Any OpenMOA stream object |
| model | ClassifierSSL | SSL-capable classifier |
| labeling_rate | float | Fraction of instances that carry labels (0–1) |
| n_samples | int | Total instances to process |
| window_size | int | Windowed accuracy reporting interval |
OSNN — Online Semi-supervised Nearest Neighbor
OSNN maintains an online prototype graph. Labeled nodes propagate labels to nearby unlabeled neighbors via graph-based label propagation, enabling effective learning even at very low labeling rates.
from openmoa.classifier import OSNN
model = OSNN(
n_neighbors=5,
window_size=500,
)
results = prequential_ssl_evaluation(
stream=stream,
model=model,
labeling_rate=0.05, # only 5 % labeled
n_samples=10000,
)
2. Online Continual Learning
Online Continual Learning (OCL) addresses the challenge of learning new tasks sequentially without forgetting previous knowledge. OpenMOA provides a complete OCL pipeline for both classic benchmarks and vision tasks.
Supported Datasets
| Dataset | Class | Tasks | Notes |
|---|---|---|---|
| Split-MNIST | SplitMNIST | 5 binary tasks | Standard OCL benchmark, digits 0–9 split by pairs |
| Split-CIFAR10 | SplitCIFAR10 | 5 binary tasks | CIFAR-10 images split into 5 tasks |
| Split-CIFAR100 | SplitCIFAR100 | 10 or 20 tasks | CIFAR-100 with configurable task count |
| ViT Features | ViTFeatureStream | Configurable | Pre-extracted Vision Transformer embeddings |
Task Interface
Each OCL dataset exposes a task-based iteration interface. Tasks arrive sequentially; each task contains a stream of instances from that task's classes only.
from openmoa.datasets import SplitMNIST
dataset = SplitMNIST()
for task_id, task_stream in dataset.tasks():
print(f"Task {task_id}: classes = {task_stream.classes}")
for x, y in task_stream:
...
ocl_train_eval_loop
The unified OCL training loop handles task boundaries, calls the strategy's
before_task / after_task hooks, and accumulates the accuracy matrix.
from openmoa.evaluation import ocl_train_eval_loop
from openmoa.ocl import ExperienceReplay
strategy = ExperienceReplay(mem_size=200)
metrics = ocl_train_eval_loop(
dataset=SplitMNIST(),
strategy=strategy,
)
OCLMetrics
The OCLMetrics object returned by ocl_train_eval_loop provides the
standard continual learning metrics derived from the accuracy matrix R, where
R[i][j] is accuracy on task j after training on task i.
| Metric | Property | Description |
|---|---|---|
| Accuracy Matrix | metrics.accuracy_matrix | Full R[i][j] matrix |
| Average Accuracy | metrics.avg_accuracy | Mean of final-row accuracies |
| Backward Transfer | metrics.backward_transfer | Average change in past-task accuracy after learning new tasks |
| Forward Transfer | metrics.forward_transfer | Average accuracy on future tasks before they are trained |
| Forgetting | metrics.forgetting | Average accuracy drop on previously-learned tasks |
| Anytime Accuracy | metrics.anytime_accuracy | Average accuracy over all tasks at every evaluation checkpoint |
Strategies
| Strategy | Class | Key Parameter | Description |
|---|---|---|---|
| Experience Replay | ExperienceReplay | mem_size | Stores a reservoir of past examples and interleaves them during training |
| SLDA | SLDA | shrinkage | Streaming Linear Discriminant Analysis; updates class means and shared covariance |
| NCM | NCM | n_neighbors | Nearest-Class-Mean classifier; classifies by nearest stored class centroid |
from openmoa.ocl import ExperienceReplay, SLDA, NCM
er = ExperienceReplay(mem_size=200)
slda = SLDA(shrinkage=1e-4)
ncm = NCM(n_neighbors=1)
for name, strategy in [("ER", er), ("SLDA", slda), ("NCM", ncm)]:
m = ocl_train_eval_loop(SplitMNIST(), strategy)
print(f"{name}: avg_acc={m.avg_accuracy:.3f}, forgetting={m.forgetting:.3f}")
3. MOA Integration
OpenMOA bridges the rich Java-based MOA (Massive Online Analysis) ecosystem through JPype. This gives access to hundreds of MOA classifiers and detectors without leaving Python.
MOAClassifier
MOAClassifier wraps any MOA learner using its CLI string syntax. It implements the
standard OpenMOA Classifier interface so it drops into any evaluation loop.
from openmoa.classifier import MOAClassifier
from openmoa.evaluation import prequential_evaluation
clf = MOAClassifier(moa_learner="trees.HoeffdingTree -g 200 -c 0.0")
results = prequential_evaluation(stream, clf, n_samples=10000)
MOADriftDetector
MOADriftDetector wraps MOA drift detectors, bridging them into OpenMOA's
DriftDetector interface.
from openmoa.drift import MOADriftDetector
detector = MOADriftDetector(moa_detector="drift.ADWIN -d 0.002")
for x, y in stream:
y_pred = clf.predict_one(x)
detector.add_element(int(y_pred != y))
if detector.detected_change():
print("Drift detected!")
OpenFeatureStream
OpenFeatureStream simulates evolving feature spaces — a key characteristic of
open-world streams where new sensors, modalities, or measurement types appear over time.
Evolution patterns
| Pattern | Value | Description |
|---|---|---|
| Pyramid | "pyramid" | Features added then removed, forming a pyramid shape over time |
| Incremental | "incremental" | Features only added; feature space grows monotonically |
| Decremental | "decremental" | Features only removed; feature space shrinks monotonically |
| TDS | "tds" | Temporally-dependent sequence: features appear in correlated bursts |
| CDS | "cds" | Concept-dependent sequence: feature space tied to underlying concept |
| EDS | "eds" | Event-driven sequence: feature evolution triggered by stream events |
Feature selection modes
| Mode | Value | Description |
|---|---|---|
| Intersection | "intersection" | Use only features present in both training and current instance |
| Union | "union" | Use all known features; fill missing with zero / mean |
| New-only | "new_only" | Use only newly appeared features for the current window |
from openmoa.stream import OpenFeatureStream
stream = OpenFeatureStream(
base_stream="generators.SEAGenerator",
evolution_pattern="pyramid",
feature_selection="union",
n_features_start=5,
n_features_max=20,
evolution_speed=500, # features change every 500 instances
)
Lazy Java Loading
JPype and the MOA JVM are loaded on first use — not at import time. Importing OpenMOA is
instant even without a JVM; Java initializes only when a MOAClassifier,
MOADriftDetector, or OpenFeatureStream is first instantiated.
import openmoa # instant — no JVM started
from openmoa.classifier import MOAClassifier
clf = MOAClassifier("trees.HoeffdingTree") # JVM starts here
4. Prediction Intervals
For regression tasks, OpenMOA supports prediction intervals alongside point estimates. Regressors that implement interval prediction return a 3-tuple instead of a scalar.
Output Format
An interval-capable regressor's predict_one returns (lower, point, upper):
lower, point, upper = regressor.predict_one(x)
# lower — lower confidence bound
# point — point estimate (same as a standard regressor)
# upper — upper confidence bound
Evaluators
Two evaluators handle interval predictions; both accept the 3-tuple output directly.
| Class | Description |
|---|---|
| PredictionIntervalEvaluator | Cumulative evaluator — accumulates statistics over the full stream |
| PredictionIntervalWindowedEvaluator | Windowed evaluator — reports metrics over a sliding window of recent instances |
from openmoa.evaluation import PredictionIntervalWindowedEvaluator
evaluator = PredictionIntervalWindowedEvaluator(window_size=1000)
for x, y in stream:
lower, point, upper = regressor.predict_one(x)
evaluator.update(y, lower, point, upper)
regressor.learn_one(x, y)
metrics = evaluator.get_metrics()
Interval Metrics
| Metric | Key | Interpretation |
|---|---|---|
| Coverage (PICP) | metrics.picp | Fraction of true values falling within [lower, upper]; target = nominal level (e.g. 0.90) |
| Mean Interval Width | metrics.mean_width | Average width of prediction intervals; smaller = more precise |
| PINAW | metrics.pinaw | Interval width normalized by the target range |
| Point RMSE | metrics.point_rmse | RMSE of the point estimate only |
| Point MAE | metrics.point_mae | MAE of the point estimate only |
5. Performance Optimization
OpenMOA provides several mechanisms to accelerate evaluation and training on large-scale streams.
Fast Evaluation Mode
Fast mode uses MOA's native EfficientEvaluationLoops — the entire prequential loop
runs in Java, avoiding Python overhead per instance. It activates automatically when both
conditions are met:
| Condition | Requirement |
|---|---|
| Classifier | Must be a MOAClassifier (backed by a Java MOA learner) |
| Stream | Must be a MOAStream or any stream with a native MOA generator |
from openmoa.classifier import MOAClassifier
from openmoa.stream import MOAStream
from openmoa.evaluation import prequential_evaluation
clf = MOAClassifier("trees.HoeffdingTree")
stream = MOAStream("generators.SEAGenerator -f 2")
# Both MOA — fast mode activates automatically
results = prequential_evaluation(
stream=stream,
model=clf,
n_samples=1_000_000,
)
Batch Learning
BatchClassifier wraps any scikit-learn compatible classifier for stream evaluation.
It collects instances into mini-batches before calling the underlying learner's
partial_fit.
from openmoa.classifier import BatchClassifier
from sklearn.linear_model import SGDClassifier
clf = BatchClassifier(
classifier=SGDClassifier(),
batch_size=256,
)
results = prequential_evaluation(stream, clf, n_samples=50000)
| Parameter | Type | Description |
|---|---|---|
| classifier | sklearn estimator | Any estimator with a partial_fit method |
| batch_size | int | Number of instances accumulated before calling partial_fit |
Multi-Learner Single-Pass
When benchmarking multiple classifiers, pass a list of models to prequential_evaluation.
The stream is iterated only once and all models are updated simultaneously — cutting total evaluation
time roughly by the number of models.
from openmoa.classifier import HoeffdingTree, NaiveBayes, KNN
models = [HoeffdingTree(), NaiveBayes(), KNN(n_neighbors=5)]
# Single pass — all models evaluated in one stream iteration
results = prequential_evaluation(
stream=stream,
model=models,
n_samples=100000,
window_size=1000,
)
Performance Summary
| Technique | When to Use | Typical Speedup |
|---|---|---|
| Fast Mode (MOA native loop) | MOAClassifier + MOAStream, >100 k instances | 10–50× |
| BatchClassifier | sklearn learners needing mini-batch updates | 2–5× |
| Multi-Learner Single-Pass | Benchmarking N models simultaneously | ~N× |