Tutorial ~20 min read

Evaluation Methods

OpenMOA provides a comprehensive evaluation framework for online / stream learning. It supports multiple evaluation paradigms, task types, and metric sets — all built on MOA's high-performance Java backend with a clean Python API.

1. Evaluation in OpenMOA

OpenMOA's evaluation framework follows the prequential (test-then-train) paradigm, which is the standard approach for evaluating learning algorithms on data streams. In this paradigm, each instance is first used for prediction (testing), then used to update the model (training) — ensuring every instance contributes to both evaluation and learning.

Evaluations produce two complementary views:

Cumulative Metrics

Aggregate performance over all instances seen so far. Returns a single scalar value — the overall performance across the full stream.

Windowed Metrics

Performance within the most recent fixed-size window (default: 1000 instances). Returns a list of values — one per completed window — capturing how performance evolves over time.


2. prequential_evaluation

supervised

Standard evaluation for a single supervised learner (classification or regression).

Signature
from openmoa.evaluation import prequential_evaluation

results = prequential_evaluation(
    stream,
    learner,
    max_instances=None,        # None = full stream
    window_size=1000,
    store_predictions=False,
    store_y=False,
    optimise=True,             # Uses fast MOA evaluation loop when compatible
    restart_stream=True,
    progress_bar=False,
    batch_size=1,
)

Key Parameters

ParameterTypeDefaultDescription
streamStreamrequiredData stream to evaluate on
learnerClassifier / RegressorrequiredThe learner to evaluate
max_instancesint or NoneNoneMax instances to process; None = full stream
window_sizeint1000Size of evaluation windows
store_predictionsboolFalseWhether to store all predictions
store_yboolFalseWhether to store all ground truth labels
optimiseboolTrueUse fast MOA native loop when possible
restart_streamboolTrueReset stream to beginning before evaluation
batch_sizeint1Number of instances per training batch

Returns: PrequentialResults object

Example

prequential_evaluation — basic usage
from openmoa.classifier import NaiveBayes
from openmoa.datasets import Electricity
from openmoa.evaluation import prequential_evaluation

stream = Electricity()
learner = NaiveBayes(schema=stream.get_schema())

results = prequential_evaluation(stream=stream, learner=learner, max_instances=10000)

print(results.cumulative.accuracy())   # e.g., 76.34
print(results.windowed.accuracy())     # list of values per window

3. prequential_evaluation_multiple_learners

multi-learner

Evaluates multiple learners on the same stream simultaneously, iterating the stream only once for efficiency.

Signature
from openmoa.evaluation import prequential_evaluation_multiple_learners

results_dict = prequential_evaluation_multiple_learners(
    stream,
    learners,           # Dict[str, Learner]
    max_instances=None,
    window_size=1000,
    store_predictions=False,
    store_y=False,
    progress_bar=False,
)

Key Parameters

ParameterTypeDefaultDescription
learnersDict[str, Learner]requiredDictionary mapping learner names to learner objects

Returns: Dict[str, PrequentialResults] — one result per learner name

Example — comparing two classifiers
from openmoa.classifier import NaiveBayes, HoeffdingTree
from openmoa.evaluation import prequential_evaluation_multiple_learners

learners = {
    "NaiveBayes":    NaiveBayes(schema=stream.get_schema()),
    "HoeffdingTree": HoeffdingTree(schema=stream.get_schema()),
}

results = prequential_evaluation_multiple_learners(stream, learners)

for name, res in results.items():
    print(f"{name}: {res.cumulative.accuracy():.2f}%")

4. prequential_ssl_evaluation

semi-supervised

Evaluation for semi-supervised learning, where only a fraction of instances are labeled.

Signature
from openmoa.evaluation import prequential_ssl_evaluation

results = prequential_ssl_evaluation(
    stream,
    learner,
    max_instances=None,
    window_size=1000,
    initial_window_size=0,
    delay_length=0,
    label_probability=0.01,    # 1% of instances are labeled
    random_seed=1,
    store_predictions=False,
    store_y=False,
    optimise=True,
    restart_stream=True,
    progress_bar=False,
    batch_size=1,
)

Key Parameters

ParameterTypeDefaultDescription
label_probabilityfloat0.01Fraction of instances that provide a label (0.0 – 1.0)
initial_window_sizeint0Number of fully labeled instances at the start
delay_lengthint0Delay (in instances) before a label becomes available

Returns: PrequentialResults object

Note: label_probability=0.01 means roughly 1 in 100 instances will carry a label during evaluation. Use initial_window_size to warm-start the model with a small fully-labeled window before entering the semi-supervised regime.

5. prequential_evaluation_anomaly

anomaly detection

Evaluation for anomaly detection tasks, using AUC-based metrics.

Signature
from openmoa.evaluation import prequential_evaluation_anomaly

results = prequential_evaluation_anomaly(
    stream,
    learner,
    max_instances=None,
    window_size=1000,
    optimise=True,
    store_predictions=False,
    store_y=False,
    progress_bar=False,
)

Returns: PrequentialResults object with anomaly detection metrics (AUC, sAUC)


6. Reading Evaluation Results

All evaluation functions return a PrequentialResults object. The two primary views are .cumulative (scalar metrics over all instances) and .windowed (list of metrics, one per window).

Attributes & Methods

Attribute / MethodReturnsDescription
.cumulativeEvaluatorCumulative evaluator — metrics as scalars over all instances
.windowedEvaluatorWindowed evaluator — metrics as lists, one value per window
.wallclock()floatTotal wall clock time in seconds
.cpu_time()floatTotal CPU time in seconds
.max_instances()intTotal number of instances evaluated
.predictions()listAll stored predictions (requires store_predictions=True)
.ground_truth_y()listAll stored ground truth labels (requires store_y=True)
.metrics_per_window()DataFramePandas DataFrame of all windowed metrics

Export Results

Write results to CSV
results.write_to_file(path="./output", directory_name="my_experiment")
# Writes cumulative and windowed metrics to CSV files

7. Metrics by Task Type

Classification Metrics

Available on ClassificationEvaluator and ClassificationWindowedEvaluator:

MetricMethodDescription
Accuracy.accuracy()Percentage of correct predictions
Kappa Statistic.kappa()Cohen's Kappa (chance-corrected accuracy)
Kappa T.kappa_t()Temporal Kappa (compared to no-change predictor)
Kappa M.kappa_m()Marginal Kappa
F1 Score.f1_score()Harmonic mean of precision and recall
Precision.precision()Positive predictive value
Recall.recall()True positive rate
Per-class F1.f1_score_N()F1 score for class N
Per-class Precision.precision_N()Precision for class N
Per-class Recall.recall_N()Recall for class N
Instances.instances()Total number of instances processed

Regression Metrics

Available on RegressionEvaluator and RegressionWindowedEvaluator:

MetricMethodDescription
MAE.mae()Mean Absolute Error
RMSE.rmse()Root Mean Squared Error
RMAE.rmae()Relative Mean Absolute Error
RRMSE.rrmse()Relative Root Mean Squared Error
.r2()Coefficient of Determination
Adjusted R².adjusted_r2()R² adjusted for number of features

Prediction Interval Metrics

Available on PredictionIntervalEvaluator:

MetricMethodDescription
Coverage.coverage()% of true values falling within predicted intervals
Average Length.average_length()Mean width of prediction intervals
NMPIW.nmpiw()Normalized Mean Prediction Interval Width

Anomaly Detection Metrics

Available on AnomalyDetectionEvaluator:

MetricMethodDescription
AUC.auc()Area Under the ROC Curve
sAUC.s_auc()Sensitivity-adjusted AUC

8. Evaluator Classes

For advanced use cases, evaluator classes can be instantiated and updated manually — giving you full control over the evaluation loop.

Classification Evaluator

ClassificationEvaluator
from openmoa.evaluation import ClassificationEvaluator

evaluator = ClassificationEvaluator(schema=stream.get_schema(), window_size=1000)

for x, y in stream:
    y_pred = learner.predict(x)
    evaluator.update(y_target_index=y, y_pred_index=y_pred)
    learner.train(x, y)

print(evaluator.accuracy())         # Cumulative accuracy
print(evaluator.metrics_dict())     # All metrics as dict

Regression Evaluator

RegressionEvaluator
from openmoa.evaluation import RegressionEvaluator

evaluator = RegressionEvaluator(schema=stream.get_schema(), window_size=1000)
evaluator.update(y=actual_value, y_pred=predicted_value)
print(evaluator.mae(), evaluator.rmse())

Prediction Interval Evaluator

PredictionIntervalEvaluator
from openmoa.evaluation import PredictionIntervalEvaluator

evaluator = PredictionIntervalEvaluator(schema=stream.get_schema())
# y_pred must be a 3-tuple: [lower_bound, prediction, upper_bound]
evaluator.update(y=actual, y_pred=[lower, pred, upper])
print(evaluator.coverage(), evaluator.nmpiw())

Clustering Evaluator

ClusteringEvaluator
from openmoa.evaluation import ClusteringEvaluator

evaluator = ClusteringEvaluator(schema=stream.get_schema())
evaluator.update(clusterer=my_clusterer)
macro, micro = evaluator.get_measurements()

9. Specialized Evaluations

Online Continual Learning (OCL) Metrics

Located in openmoa.ocl.evaluation, the OCLMetrics dataclass provides evaluation for multi-task continual learning streams:

Drift Detection Evaluation

Located in openmoa.drift.eval_detector, the EvaluateDetector class evaluates drift detector performance against known ground-truth drift points:

MetricDescription
Mean Time to DetectAverage delay between true drift and detection
Missed Detection RatioFraction of true drifts that were missed
Mean Time Between False AlarmsAverage gap between false positive detections
EvaluateDetector
from openmoa.drift.eval_detector import EvaluateDetector

evaluator = EvaluateDetector(max_delay=200)
results = evaluator.calc_performance(
    preds=detected_positions,
    trues=true_drift_positions
)

10. Fast Evaluation Mode

When optimise=True (default), OpenMOA automatically detects whether a stream–learner pair is compatible with MOA's native EfficientEvaluationLoops. When compatible, this significantly speeds up evaluation by running the full evaluation loop in Java rather than Python.

Fast mode is automatically disabled when:
  • A progress bar is requested (progress_bar=True)
  • A custom MOA evaluator is provided
  • The stream or learner is Python-native (not a MOA wrapper)

11. How Prequential Evaluation Works

Every instance passes through the same three-step cycle before the next instance arrives:

Stream Instance
Predict
Evaluate
update metrics
Train
update model
next instance
Cumulative Metrics
scalar — all instances
Windowed Metrics
list — per window
PrequentialResults
.cumulative .windowed .wallclock() .cpu_time() .predictions() .ground_truth_y()