Evaluation Methods
OpenMOA provides a comprehensive evaluation framework for online / stream learning. It supports multiple evaluation paradigms, task types, and metric sets — all built on MOA's high-performance Java backend with a clean Python API.
1. Evaluation in OpenMOA
OpenMOA's evaluation framework follows the prequential (test-then-train) paradigm, which is the standard approach for evaluating learning algorithms on data streams. In this paradigm, each instance is first used for prediction (testing), then used to update the model (training) — ensuring every instance contributes to both evaluation and learning.
Evaluations produce two complementary views:
Cumulative Metrics
Aggregate performance over all instances seen so far. Returns a single scalar value — the overall performance across the full stream.
Windowed Metrics
Performance within the most recent fixed-size window (default: 1000 instances). Returns a list of values — one per completed window — capturing how performance evolves over time.
2. prequential_evaluation
supervised
Standard evaluation for a single supervised learner (classification or regression).
from openmoa.evaluation import prequential_evaluation
results = prequential_evaluation(
stream,
learner,
max_instances=None, # None = full stream
window_size=1000,
store_predictions=False,
store_y=False,
optimise=True, # Uses fast MOA evaluation loop when compatible
restart_stream=True,
progress_bar=False,
batch_size=1,
)
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| stream | Stream | required | Data stream to evaluate on |
| learner | Classifier / Regressor | required | The learner to evaluate |
| max_instances | int or None | None | Max instances to process; None = full stream |
| window_size | int | 1000 | Size of evaluation windows |
| store_predictions | bool | False | Whether to store all predictions |
| store_y | bool | False | Whether to store all ground truth labels |
| optimise | bool | True | Use fast MOA native loop when possible |
| restart_stream | bool | True | Reset stream to beginning before evaluation |
| batch_size | int | 1 | Number of instances per training batch |
Returns: PrequentialResults object
Example
from openmoa.classifier import NaiveBayes
from openmoa.datasets import Electricity
from openmoa.evaluation import prequential_evaluation
stream = Electricity()
learner = NaiveBayes(schema=stream.get_schema())
results = prequential_evaluation(stream=stream, learner=learner, max_instances=10000)
print(results.cumulative.accuracy()) # e.g., 76.34
print(results.windowed.accuracy()) # list of values per window
3. prequential_evaluation_multiple_learners
multi-learner
Evaluates multiple learners on the same stream simultaneously, iterating the stream only once for efficiency.
from openmoa.evaluation import prequential_evaluation_multiple_learners
results_dict = prequential_evaluation_multiple_learners(
stream,
learners, # Dict[str, Learner]
max_instances=None,
window_size=1000,
store_predictions=False,
store_y=False,
progress_bar=False,
)
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| learners | Dict[str, Learner] | required | Dictionary mapping learner names to learner objects |
Returns: Dict[str, PrequentialResults] — one result per learner name
from openmoa.classifier import NaiveBayes, HoeffdingTree
from openmoa.evaluation import prequential_evaluation_multiple_learners
learners = {
"NaiveBayes": NaiveBayes(schema=stream.get_schema()),
"HoeffdingTree": HoeffdingTree(schema=stream.get_schema()),
}
results = prequential_evaluation_multiple_learners(stream, learners)
for name, res in results.items():
print(f"{name}: {res.cumulative.accuracy():.2f}%")
4. prequential_ssl_evaluation
semi-supervised
Evaluation for semi-supervised learning, where only a fraction of instances are labeled.
from openmoa.evaluation import prequential_ssl_evaluation
results = prequential_ssl_evaluation(
stream,
learner,
max_instances=None,
window_size=1000,
initial_window_size=0,
delay_length=0,
label_probability=0.01, # 1% of instances are labeled
random_seed=1,
store_predictions=False,
store_y=False,
optimise=True,
restart_stream=True,
progress_bar=False,
batch_size=1,
)
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| label_probability | float | 0.01 | Fraction of instances that provide a label (0.0 – 1.0) |
| initial_window_size | int | 0 | Number of fully labeled instances at the start |
| delay_length | int | 0 | Delay (in instances) before a label becomes available |
Returns: PrequentialResults object
label_probability=0.01 means roughly 1 in 100 instances will carry
a label during evaluation. Use initial_window_size to warm-start the model with a small
fully-labeled window before entering the semi-supervised regime.
5. prequential_evaluation_anomaly
anomaly detection
Evaluation for anomaly detection tasks, using AUC-based metrics.
from openmoa.evaluation import prequential_evaluation_anomaly
results = prequential_evaluation_anomaly(
stream,
learner,
max_instances=None,
window_size=1000,
optimise=True,
store_predictions=False,
store_y=False,
progress_bar=False,
)
Returns: PrequentialResults object with anomaly detection metrics (AUC, sAUC)
6. Reading Evaluation Results
All evaluation functions return a PrequentialResults object. The two primary views are
.cumulative (scalar metrics over all instances) and .windowed (list of metrics,
one per window).
Attributes & Methods
| Attribute / Method | Returns | Description |
|---|---|---|
| .cumulative | Evaluator | Cumulative evaluator — metrics as scalars over all instances |
| .windowed | Evaluator | Windowed evaluator — metrics as lists, one value per window |
| .wallclock() | float | Total wall clock time in seconds |
| .cpu_time() | float | Total CPU time in seconds |
| .max_instances() | int | Total number of instances evaluated |
| .predictions() | list | All stored predictions (requires store_predictions=True) |
| .ground_truth_y() | list | All stored ground truth labels (requires store_y=True) |
| .metrics_per_window() | DataFrame | Pandas DataFrame of all windowed metrics |
Export Results
results.write_to_file(path="./output", directory_name="my_experiment")
# Writes cumulative and windowed metrics to CSV files
7. Metrics by Task Type
Classification Metrics
Available on ClassificationEvaluator and ClassificationWindowedEvaluator:
| Metric | Method | Description |
|---|---|---|
| Accuracy | .accuracy() | Percentage of correct predictions |
| Kappa Statistic | .kappa() | Cohen's Kappa (chance-corrected accuracy) |
| Kappa T | .kappa_t() | Temporal Kappa (compared to no-change predictor) |
| Kappa M | .kappa_m() | Marginal Kappa |
| F1 Score | .f1_score() | Harmonic mean of precision and recall |
| Precision | .precision() | Positive predictive value |
| Recall | .recall() | True positive rate |
| Per-class F1 | .f1_score_N() | F1 score for class N |
| Per-class Precision | .precision_N() | Precision for class N |
| Per-class Recall | .recall_N() | Recall for class N |
| Instances | .instances() | Total number of instances processed |
Regression Metrics
Available on RegressionEvaluator and RegressionWindowedEvaluator:
| Metric | Method | Description |
|---|---|---|
| MAE | .mae() | Mean Absolute Error |
| RMSE | .rmse() | Root Mean Squared Error |
| RMAE | .rmae() | Relative Mean Absolute Error |
| RRMSE | .rrmse() | Relative Root Mean Squared Error |
| R² | .r2() | Coefficient of Determination |
| Adjusted R² | .adjusted_r2() | R² adjusted for number of features |
Prediction Interval Metrics
Available on PredictionIntervalEvaluator:
| Metric | Method | Description |
|---|---|---|
| Coverage | .coverage() | % of true values falling within predicted intervals |
| Average Length | .average_length() | Mean width of prediction intervals |
| NMPIW | .nmpiw() | Normalized Mean Prediction Interval Width |
Anomaly Detection Metrics
Available on AnomalyDetectionEvaluator:
| Metric | Method | Description |
|---|---|---|
| AUC | .auc() | Area Under the ROC Curve |
| sAUC | .s_auc() | Sensitivity-adjusted AUC |
8. Evaluator Classes
For advanced use cases, evaluator classes can be instantiated and updated manually — giving you full control over the evaluation loop.
Classification Evaluator
from openmoa.evaluation import ClassificationEvaluator
evaluator = ClassificationEvaluator(schema=stream.get_schema(), window_size=1000)
for x, y in stream:
y_pred = learner.predict(x)
evaluator.update(y_target_index=y, y_pred_index=y_pred)
learner.train(x, y)
print(evaluator.accuracy()) # Cumulative accuracy
print(evaluator.metrics_dict()) # All metrics as dict
Regression Evaluator
from openmoa.evaluation import RegressionEvaluator
evaluator = RegressionEvaluator(schema=stream.get_schema(), window_size=1000)
evaluator.update(y=actual_value, y_pred=predicted_value)
print(evaluator.mae(), evaluator.rmse())
Prediction Interval Evaluator
from openmoa.evaluation import PredictionIntervalEvaluator
evaluator = PredictionIntervalEvaluator(schema=stream.get_schema())
# y_pred must be a 3-tuple: [lower_bound, prediction, upper_bound]
evaluator.update(y=actual, y_pred=[lower, pred, upper])
print(evaluator.coverage(), evaluator.nmpiw())
Clustering Evaluator
from openmoa.evaluation import ClusteringEvaluator
evaluator = ClusteringEvaluator(schema=stream.get_schema())
evaluator.update(clusterer=my_clusterer)
macro, micro = evaluator.get_measurements()
9. Specialized Evaluations
Online Continual Learning (OCL) Metrics
Located in openmoa.ocl.evaluation, the OCLMetrics dataclass provides
evaluation for multi-task continual learning streams:
- Anytime Accuracy — measured during training within each task
- Forward Transfer — how learning a task improves future tasks
- Backward Transfer — how learning new tasks affects performance on past tasks
- Per-task and aggregate accuracy matrices
Drift Detection Evaluation
Located in openmoa.drift.eval_detector, the EvaluateDetector class evaluates
drift detector performance against known ground-truth drift points:
| Metric | Description |
|---|---|
| Mean Time to Detect | Average delay between true drift and detection |
| Missed Detection Ratio | Fraction of true drifts that were missed |
| Mean Time Between False Alarms | Average gap between false positive detections |
from openmoa.drift.eval_detector import EvaluateDetector
evaluator = EvaluateDetector(max_delay=200)
results = evaluator.calc_performance(
preds=detected_positions,
trues=true_drift_positions
)
10. Fast Evaluation Mode
When optimise=True (default), OpenMOA automatically detects whether a stream–learner pair
is compatible with MOA's native EfficientEvaluationLoops. When compatible, this
significantly speeds up evaluation by running the full evaluation loop in Java rather than Python.
- A progress bar is requested (
progress_bar=True) - A custom MOA evaluator is provided
- The stream or learner is Python-native (not a MOA wrapper)
11. How Prequential Evaluation Works
Every instance passes through the same three-step cycle before the next instance arrives:
update metrics
update model
scalar — all instances
list — per window