Core Concepts

Understand streams, online learning, and drift detection — the foundations of OpenMOA.

OpenMOA is a machine learning framework designed for data streams. Unlike traditional batch learning where all data is available upfront, stream learning processes data that arrives one instance at a time, cannot be revisited, and whose distribution may change over time. OpenMOA is built around four core abstractions: Stream, Instance, Learner, and Evaluator.

1. Stream — Data Stream

The Stream is the fundamental abstraction in OpenMOA. All data sources implement the Stream interface, providing iterator-based access to instances one at a time.

Core Interface Methods:

class Stream(ABC):
    def next_instance(self) -> Instance:
        """Return the next instance from the stream"""
    
    def has_more_instances(self) -> bool:
        """Check if more instances are available"""
    
    def get_schema(self) -> Schema:
        """Return stream schema (num features, classes, etc.)"""
    
    def restart(self):
        """Reset stream to the beginning"""

Stream also implements Python's Iterator protocol, supporting for instance in stream: syntax.

Built-in Stream Types:

Type	Description	Example
`ARFFStream`	Load from ARFF files	`ARFFStream("data.arff")`
`CSVStream`	Load from CSV files	`CSVStream("data.csv")`
`LibsvmStream`	Load from LibSVM format (sparse)	`LibsvmStream("data.libsvm")`
`NumpyStream`	Create from NumPy arrays	`NumpyStream(X, y)`
`MOAStream`	Wrap MOA Java stream objects	For MOA generators
`ConcatStream`	Concatenate multiple streams	`ConcatStream([s1, s2])`

Built-in Datasets (Auto-download):

from openmoa.datasets import Electricity, RCV1, Magic04

stream = Electricity()  # Auto-downloads on first use

Includes binary classification (Australian, Spambase, Adult, RCV1), multi-class (DryBean, Optdigits), and regression (Bike, Fried) tasks.

2. Schema — Stream Structure

The Schema describes metadata about a stream — it's the bridge between streams and learners.

schema = stream.get_schema()

schema.get_num_attributes()  # Number of features (excluding target)
schema.get_num_classes()     # Number of classes (classification)
schema.is_classification()   # Is this a classification task?
schema.is_regression()       # Is this a regression task?
schema.get_label_values()    # List of possible class labels

Schema is passed to learners during initialization to ensure model structure matches the data.

3. Instance — Data Point

Each data point read from a stream is an Instance object.

Classification Instance (LabeledInstance):

instance = stream.next_instance()
instance.x        # numpy.ndarray — feature vector
instance.y_index  # int — class index (0, 1, 2, ...)
instance.y_label  # str — class label ("positive", "negative", ...)

Regression Instance (RegressionInstance):

instance.x        # numpy.ndarray — feature vector
instance.y_value  # float — target continuous value

4. Learner — Online Learner

All learners follow a unified train + predict interface.

Classifier Base Class:

class Classifier(ABC):
    def __init__(self, schema: Schema, random_seed: int = 1): ...
    
    def train(self, instance: LabeledInstance) -> None:
        """Incrementally train on a single instance"""
    
    def predict(self, instance: Instance) -> int:
        """Return predicted class index"""
    
    def predict_proba(self, instance: Instance) -> np.ndarray:
        """Return probability distribution over classes"""

Regressor Base Class:

class Regressor(ABC):
    def train(self, instance: RegressionInstance) -> None: ...
    def predict(self, instance: RegressionInstance) -> float: ...

Learner Sources:

Type	Description	Examples
MOA Algorithms	Java implementation via JPype	`HoeffdingTree`, `NaiveBayes`
Native Python	Pure Python/NumPy implementation	`FOBOSClassifier`, `FTRLClassifier`
Scikit-learn Wrappers	Wrap sklearn models with `partial_fit`	`SKClassifier(SGDClassifier())`
Batch Learners	Support mini-batch training	Inherit `BatchClassifier`

5. Online Learning Protocol

OpenMOA uses the Prequential (Test-Then-Train) evaluation protocol. For each arriving instance:

for instance in stream:
    # ① Predict with current model
    prediction = learner.predict(instance)
    
    # ② Record result (correct/incorrect)
    evaluator.update(truth, prediction)
    
    # ③ Train on the instance
    learner.train(instance)

This ensures each instance is evaluated before being used for training — the standard evaluation paradigm for stream learning.

Using the Built-in Evaluation Function:

from openmoa.evaluation import prequential_evaluation

results = prequential_evaluation(
    stream=stream,
    learner=learner,
    max_instances=10000,
    window_size=1000     # Sliding window size
)

print(results.cumulative.accuracy())   # Cumulative accuracy
print(results.windowed.accuracy())     # Sliding window accuracy

6. Evaluator

ClassificationEvaluator tracks the following metrics:

accuracy() — Accuracy
kappa() — Cohen's Kappa (handles class imbalance)
precision() / recall() / f1_score() — Classification metrics
metrics_dict() — Get all metrics at once

Two Evaluation Modes:

Mode	Description
Cumulative	Overall metrics computed from the first instance
Windowed	Metrics based on only the last N instances — reflects recent performance

RegressionEvaluator metrics: MSE, RMSE, MAE, R²

7. Drift Detection

In real data streams, the statistical distribution may change over time (concept drift). OpenMOA provides drift detectors to monitor these changes.

Drift Detector Interface:

class BaseDriftDetector(ABC):
    def add_element(self, element: float) -> None:
        """Input an observation (usually prediction error: 0 or 1)"""
    
    def detected_change(self) -> bool:
        """Has drift been detected?"""
    
    def detected_warning(self) -> bool:
        """Is the detector in warning zone?"""

Built-in Detectors:

ADWIN Adaptive Windowing — detects sub-window mean differences

DDM Drift Detection Method — monitors error rate statistics

EDDM Early DDM — improved sensitivity to gradual drift

HDDM_A / HDDM_W Hoeffding DDM — based on Hoeffding bounds

PageHinkley Page-Hinkley Test — sequential analysis

CUSUM Cumulative Sum — control chart method

Usage Example:

from openmoa.drift.detectors import ADWIN

detector = ADWIN(delta=0.002)

for instance in stream:
    prediction = learner.predict(instance)
    is_correct = (prediction == instance.y_index)
    
    detector.add_element(1 - is_correct)  # Input error
    
    if detector.detected_change():
        print("Drift detected! Consider resetting the learner.")
        learner = create_new_learner()  # Reset model
    
    learner.train(instance)

8. Stream Wrappers — Feature Space Evolution

A unique feature of OpenMOA is simulating feature space drift (changing feature dimensions) through Stream Wrappers.

Two Approaches:

Approach Output Dimension Missing Representation Suitable Algorithms

OpenFeatureStream Variable length Not in vector Sparse-aware (FOBOS, ORF3V)

TrapezoidalStream / CapriciousStream / EvolvableStream Fixed length NaN placeholder Missing value handling (OVFM)

Approach	Output Dimension	Missing Representation	Suitable Algorithms
`OpenFeatureStream`	Variable length	Not in vector	Sparse-aware (FOBOS, ORF3V)
`TrapezoidalStream` / `CapriciousStream` / `EvolvableStream`	Fixed length	NaN placeholder	Missing value handling (OVFM)

Three Evolution Patterns:

Pattern	Full Name	Behavior
TDS	Trapezoidal Data Stream	Features "born" gradually, dimension monotonically increases
CDS	Capricious Data Stream	Features randomly appear/disappear at each timestep
EDS	Evolvable Data Stream	Feature space replaces in segments with transition periods

Combined Usage:

from openmoa.datasets import Magic04
from openmoa.stream import ShuffledStream, OpenFeatureStream

# Layer 1: Shuffle to remove label ordering bias
base = ShuffledStream(Magic04(), random_seed=42)

# Layer 2: Inject TDS feature evolution
stream = OpenFeatureStream(
    base, evolution_pattern="tds", tds_mode="ordered",
    total_instances=19020, random_seed=42
)

9. Architecture Overview

                    ┌─────────────────────┐
                    │     Datasets        │
                    │  (Auto-download)    │
                    └────────┬────────────┘
                             │
                    ┌────────▼────────────┐
                    │   Stream Layer      │
                    │ ARFF/CSV/LibSVM/    │
                    │ Numpy/MOA Generator │
                    └────────┬────────────┘
                             │
                    ┌────────▼────────────┐
                    │  Stream Wrappers    │
                    │ Shuffle / TDS /     │
                    │ CDS / EDS           │
                    └────────┬────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼──────┐ ┌────▼─────┐ ┌──────▼───────┐
     │  MOA Learners │ │  Python  │ │   Sklearn    │
     │ (Java/JPype)  │ │ Learners │ │   Wrappers   │
     └────────┬──────┘ └────┬─────┘ └──────┬───────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
                    ┌────────▼────────────┐
                    │    Evaluation       │
                    │ Prequential / Drift │
                    │ Classification /    │
                    │ Regression          │
                    └─────────────────────┘

Previous: Installation Next: Working with Streams