API Reference Complete Reference

openmoa.streams

Complete API reference for all stream classes, instance types, schema, and stream wrappers in OpenMOA. Covers every public class, method, parameter, and attribute.

§0   Module Exports

openmoa/stream/__init__.py
from openmoa.stream import (
    Stream, MOAStream,
    ARFFStream, NumpyStream, CSVStream, ConcatStream,
    LibsvmStream, BagOfWordsStream,
    OpenFeatureStream, EvolvingFeatureStream,   # EvolvingFeatureStream is an alias
    TrapezoidalStream, CapriciousStream, EvolvableStream,
    ShuffledStream,
    stream_from_file,
)
openmoa/stream/drift/__init__.py
from openmoa.stream.drift import (
    DriftStream, Drift, AbruptDrift, GradualDrift,
    RecurrentConceptDriftStream,
)
openmoa/stream/generator/__init__.py
from openmoa.stream.generator import (
    SEA, RandomTreeGenerator,
    RandomRBFGenerator, RandomRBFGeneratorDrift,
    LEDGenerator, LEDGeneratorDrift,
    WaveformGenerator, WaveformGeneratorDrift,
    AgrawalGenerator, HyperplaneGenerator,
    STAGGERGenerator, MixedGenerator,
)
openmoa/datasets/__init__.py (selection)
from openmoa.datasets import (
    Electricity, ElectricityTiny,
    Covtype, CovtypeNorm, CovtypeTiny, CovtFD,
    RBFm_100k, RTG_2abrupt, Hyper100k, Sensor,
    Fried, FriedTiny, Bike,
    # ... full list in §7
)

§1   Core Data Structures

class Schema · src/openmoa/stream/_stream.py

Describes the structure of a stream: attribute names, data types, number of classes, and label values. Required by all learners and evaluators.

Schema(moa_header)
signature
Schema(moa_header: InstancesHeader) -> None
ParameterTypeDescription
moa_headerInstancesHeaderJava MOA header object. Typically not called directly — use Schema.from_custom() or stream.get_schema().
Schema.from_custom (class method)
signature
Schema.from_custom(
    feature_names: Sequence[str],
    values_for_nominal_features: Dict[str, Sequence[str]] = {},
    values_for_class_label: Sequence[str] = None,
    dataset_name: str = "No_Name",
    target_attribute_name: Optional[str] = None,
    target_type: Optional[str] = None,
) -> Schema
ParameterTypeDefaultDescription
feature_namesSequence[str]requiredList of feature attribute names
values_for_nominal_featuresDict[str, Sequence[str]]{}Maps feature name → list of possible values for nominal features
values_for_class_labelSequence[str]NonePossible class label strings; if None → regression schema
dataset_namestr"No_Name"Name of the dataset
target_attribute_nameOptional[str]NoneName of the target/class attribute
target_typeOptional[str]None'categorical', 'numeric', or None (auto-detect)
Task type methods
MethodReturnsDescription
is_classification()boolTrue if classification task
is_regression()boolTrue if regression task
Attribute info methods
MethodReturnsDescription
get_num_attributes()intNumber of input features (excluding target)
get_num_numeric_attributes()intCount of numeric attributes
get_num_nominal_attributes()intCount of nominal (categorical) attributes
get_numeric_attributes()list | NoneList of numeric attribute names
get_nominal_attributes()dict | NoneDict of {name: [values]} for nominal attributes
Class / label info methods (classification only)
MethodReturnsDescription
get_num_classes()intNumber of possible classes (1 for regression)
get_label_values()Sequence[str]List of possible class label strings
get_label_indexes()Sequence[int]List of class indices [0, 1, ..., n-1]
get_value_for_index(y_index)Optional[str]Class label string for index; None if y_index is None
get_index_for_label(y)intClass index for label string; raises KeyError if not found
is_y_index_in_range(y_index)boolWhether y_index is valid for this schema
MOA access & special methods
Method / PropertyReturnsDescription
get_moa_header()InstancesHeaderUnderlying Java MOA header (advanced use)
dataset_namestrProperty — name of the dataset
__str__() / __repr__()strReturns ARFF header representation
__eq__(other)boolCompares number of attributes and classes
class Instance · src/openmoa/instance.py

Base class representing a single data point with a feature vector and schema reference.

signature
Instance(schema: Schema, instance: Union[InstanceExample, FeatureVector]) -> None
Instance.from_array (class method)
Instance.from_array(schema: Schema, instance: FeatureVector) -> Instance

Creates an Instance from a NumPy feature array (no label).

Properties
PropertyTypeDescription
xNDArray[float64]Feature vector as 1D NumPy array
schemaSchemaThe stream schema
java_instanceInstanceExampleJava representation
class LabeledInstance · Inheritance: Instance  ·  Classification tasks

Instance with a class label for classification tasks.

LabeledInstance.from_array (class method)
LabeledInstance.from_array(schema: Schema, x: FeatureVector, y_index: int) -> LabeledInstance
ParameterTypeDescription
schemaSchemaClassification schema
xNDArray[float64]Feature vector
y_indexintClass index (0-based)
Properties
PropertyTypeDescription
xNDArray[float64]Feature vector
y_indexintClass index (0-based integer)
y_labelstrClass label string (via schema.get_value_for_index)
schemaSchemaStream schema
class RegressionInstance · Inheritance: Instance  ·  Regression tasks

Instance with a continuous target value for regression tasks.

RegressionInstance.from_array (class method)
RegressionInstance.from_array(schema: Schema, x: FeatureVector, y_value: float) -> RegressionInstance
Properties
PropertyTypeDescription
xNDArray[float64]Feature vector
y_valuefloatContinuous target value
schemaSchemaStream schema
aliases Type Aliases · src/openmoa/type_alias.py
AliasUnderlying TypeDescription
FeatureVectorNDArray[float64]1D NumPy float64 array of feature values
LabelIndexintNon-negative class index integer
LabelstrClass label string
LabelProbabilitiesNDArray[float64]1D array of prediction probabilities
TargetValuefloatContinuous target value for regression

§2   Stream Base Classes

abstract Stream · ABC, Generic[_AnyInstance], Iterator[_AnyInstance]  ·  src/openmoa/stream/_stream.py

Abstract base class for all streams. Implements the Python iterator protocol. All subclasses must implement the four abstract methods below.

Abstract methods
MethodSignatureDescription
has_more_instances()() → boolTrue if stream has more instances
next_instance()() → _AnyInstanceReturns the next instance
get_schema()() → SchemaReturns the stream schema
restart()() → NoneResets the stream to the beginning
Concrete methods
MethodReturnsDescription
__iter__()IteratorReturns self; does NOT restart the stream
__next__()_AnyInstanceReturns next instance; raises StopIteration if exhausted
get_moa_stream()Optional[InstanceStream]Returns underlying Java MOA stream, or None
CLI_help()strReturns MOA option documentation (if MOA stream available)
__str__()strReturns dataset name
class MOAStream · Inheritance: Stream[_AnyInstance]

Wraps any MOA Java stream. Used internally by all built-in generators and dataset streams.

signature
MOAStream(
    moa_stream: Optional[InstanceStream] = None,
    schema: Optional[Schema] = None,
    CLI: Optional[str] = None,
) -> None
ParameterTypeDefaultDescription
moa_streamOptional[InstanceStream]NoneMOA stream Java object
schemaOptional[Schema]NoneSchema; inferred from moa_stream if None
CLIOptional[str]NoneAdditional MOA CLI arguments
Raises: ValueError if no schema and no moa_stream; ValueError if CLI provided without moa_stream.

§3   File-Based Streams

class ARFFStream · Inheritance: MOAStream[_AnyInstance]

Reads a stream from an ARFF file (Attribute-Relation File Format).

ARFFStream(
    path: Union[str, Path],
    CLI: Optional[str] = None,
    class_index: int = -1,
) -> None
ParameterTypeDefaultDescription
pathstr | PathrequiredPath to .arff file
CLIOptional[str]NoneAdditional MOA CLI arguments
class_indexint-1Index of class column (-1 = last column)
example
from openmoa.stream import ARFFStream

stream = ARFFStream("data/covtype.arff")
instance = stream.next_instance()
print(instance.x)        # feature vector
print(instance.y_index)  # class index
class NumpyStream · Inheritance: Stream[_AnyInstance]

Creates a stream directly from NumPy arrays. Useful for integrating existing datasets.

NumpyStream(
    X: np.ndarray,
    y: np.ndarray,
    dataset_name: str = "No_Name",
    feature_names: Optional[Sequence[str]] = None,
    target_name: Optional[str] = None,
    target_type: Optional[str] = None,
) -> None
ParameterTypeDefaultDescription
Xnp.ndarrayrequiredFeature matrix, shape (n_samples, n_features)
ynp.ndarrayrequiredTarget vector, shape (n_samples,)
dataset_namestr"No_Name"Name of the dataset
feature_namesOptional[Sequence[str]]NoneFeature names; auto-generated as attrib_0, attrib_1, … if None
target_nameOptional[str]NoneName of the target attribute
target_typeOptional[str]None'categorical', 'numeric', or None (auto-detect)
Attributes & Methods
NameType / ReturnsDescription
current_instance_indexintCurrent position in the array
has_more_instances()boolTrue if current_instance_index < len(X)
next_instance()LabeledInstance | RegressionInstanceNext instance from arrays
restart()NoneResets current_instance_index to 0
__len__()intTotal number of instances
class CSVStream · Inheritance: Stream[_AnyInstance]

Reads a stream from a CSV file line by line.

CSVStream(
    csv_file_path: str,
    dtypes: Optional[list] = None,
    values_for_nominal_features: Dict = {},
    class_index: int = -1,
    values_for_class_label: Optional[list] = None,
    target_attribute_name: Optional[str] = None,
    target_type: Optional[str] = None,
    skip_header: bool = False,
    delimiter: str = ",",
    dataset_name: Optional[str] = None,
) -> None
ParameterTypeDefaultDescription
csv_file_pathstrrequiredPath to CSV file
dtypesOptional[list]NoneList of (column_name, dtype) tuples; auto-inferred if None
values_for_nominal_featuresDict{}Maps column index → list of possible nominal values
class_indexint-1Index of class/target column (-1 = last)
values_for_class_labelOptional[list]NonePossible class values; auto-detected if None
target_attribute_nameOptional[str]NoneName of target attribute
target_typeOptional[str]None'categorical', 'numeric', or None
skip_headerboolFalseSkip the first line
delimiterstr","Field delimiter character
dataset_nameOptional[str]NoneDefaults to "CSVStream({path})"
Attributes
AttributeTypeDescription
csv_file_pathstrPath to file
total_number_of_linesintTotal lines in file (set at init)
class LibsvmStream · Inheritance: Stream

Reads sparse data in LIBSVM format (label feat_id:value feat_id:value …).

LibsvmStream(
    path: Union[str, Path],
    dataset_name: str = "LibsvmDataset",
    target_type: str = "categorical",
) -> None
ParameterTypeDefaultDescription
pathstr | PathrequiredPath to LIBSVM file
dataset_namestr"LibsvmDataset"Dataset name
target_typestr"categorical"'categorical' or 'numeric'
Note: Instances have a _sparse_x attribute (dict {feature_id: value}) alongside the standard x array. Raises FileNotFoundError if file does not exist.
MethodReturnsDescription
has_more_instances()boolTrue if more lines available
next_instance()LabeledInstance | RegressionInstanceNext sparse instance
restart()NoneResets position and clears cache
__len__()intTotal number of instances
class BagOfWordsStream · Inheritance: Stream

Reads text data from bag-of-words .review files for binary classification (positive vs. negative).

BagOfWordsStream(
    positive_file: Path,
    negative_file: Path,
    dataset_name: str = "BagOfWords",
    normalize: bool = True,
    shuffle_seed: Optional[int] = None,
) -> None
ParameterTypeDefaultDescription
positive_filePathrequiredFile containing positive examples
negative_filePathrequiredFile containing negative examples
dataset_namestr"BagOfWords"Dataset name
normalizeboolTrueNormalize feature vectors to unit length
shuffle_seedOptional[int]NoneShuffle seed; None = no shuffle
Note: Instances have a _sparse_x attribute (dict {word: count}). Class 0 = negative, class 1 = positive.
class ConcatStream · Inheritance: Stream[_AnyInstance]

Concatenates multiple streams into one, switching to the next stream when the current one is exhausted.

ConcatStream(streams: Sequence[Stream]) -> None
Raises: ValueError if schemas are not equal across streams.
MethodReturnsDescription
has_more_instances()boolTrue if any remaining stream has instances
next_instance()_AnyInstanceNext instance; advances to next sub-stream when exhausted
get_schema()SchemaSchema of the current stream
restart()NoneRestarts all sub-streams and resets index
__len__()intTotal length (only if all sub-streams support len())
function stream_from_file

Auto-detects file type and returns the appropriate stream object (ARFFStream for .arff, CSVStream for .csv).

stream_from_file(
    path_to_csv_or_arff: Union[str, Path],
    dataset_name: str = "NoName",
    class_index: int = -1,
    target_type: Optional[str] = None,
) -> Stream
ParameterTypeDefaultDescription
path_to_csv_or_arffstr | PathrequiredPath to .arff or .csv file
dataset_namestr"NoName"Dataset name
class_indexint-1Class column index
target_typeOptional[str]None'categorical', 'numeric', or None (CSV only)
Raises: FileNotFoundError · IsADirectoryError · ValueError (unsupported extension).

§4   Feature Evolution Wrappers

These wrappers simulate dynamic feature spaces where the set of active features changes over time.

class OpenFeatureStream · alias: EvolvingFeatureStream  ·  Inheritance: Stream

Wraps a stream to shrink/grow the active feature set over time. Each returned instance carries a feature_indices NumPy attribute indicating which original features are active.

OpenFeatureStream(
    base_stream: Stream,
    d_min: int = 2,
    d_max: Optional[int] = None,
    evolution_pattern: Literal["pyramid","incremental","decremental","tds","cds","eds"] = "pyramid",
    total_instances: int = 10000,
    feature_selection: Literal["prefix","suffix","random"] = "prefix",
    missing_ratio: float = 0.0,
    random_seed: int = 42,
    tds_mode: Literal["random","ordered"] = "random",
    n_segments: int = 2,
    overlap_ratio: float = 1.0,
) -> None
ParameterTypeDefaultDescription
base_streamStreamrequiredStream to wrap
d_minint2Minimum number of active features
d_maxOptional[int]NoneMaximum features; defaults to original feature count
evolution_patternstr"pyramid"Pattern of feature evolution (see table below)
total_instancesint10000Total stream length
feature_selectionstr"prefix"Which features to keep when dimension is reduced
missing_ratiofloat0.0Per-feature absence probability (only used by "cds" pattern)
random_seedint42RNG seed for reproducibility
tds_modestr"random""random" or "ordered" birth assignment (only used by "tds")
n_segmentsint2Number of sequential partitions (only used by "eds")
overlap_ratiofloat1.0Overlap length relative to stable period (only used by "eds")
Evolution patterns
ValueDescription
"pyramid"Feature count grows linearly from d_min to d_max, then shrinks back to d_min
"incremental"Monotonic growth from d_min to d_max
"decremental"Monotonic shrinkage from d_max to d_min
"tds"Trapezoidal: each feature has an independent birth time assigned across 10 stages
"cds"Capricious: each feature is independently present at each step with probability 1 − missing_ratio
"eds"Evolvable: n_segments sequential partitions with configurable overlap windows
Feature selection modes
ValueDescription
"prefix"Keep the first d features from the active set
"suffix"Keep the last d features
"random"Randomly select d features (reproducible per time step via seed)
class TrapezoidalStream · Inheritance: Stream

Similar to OpenFeatureStream but keeps the vector fixed-size: inactive features are filled with np.nan rather than omitted.

TrapezoidalStream(
    base_stream: Stream,
    d_min: int = 2,
    d_max: Optional[int] = None,
    evolution_mode: Literal["random","ordered","pyramid"] = "random",
    total_instances: int = 10000,
    random_seed: int = 42,
) -> None
ParameterTypeDefaultDescription
base_streamStreamrequiredStream to wrap
d_minint2Minimum active features
d_maxOptional[int]NoneMaximum features (defaults to original count)
evolution_modestr"random""random": random order; "ordered": index order; "pyramid": grow then shrink
total_instancesint10000Stream length
random_seedint42RNG seed
class CapriciousStream · Inheritance: Stream

Each feature is independently and randomly absent at each time step with probability missing_ratio. Inactive features are filled with np.nan.

CapriciousStream(
    base_stream: Stream,
    d_max: Optional[int] = None,
    missing_ratio: float = 0.5,
    total_instances: int = 10000,
    min_features: int = 1,
    random_seed: int = 42,
) -> None
ParameterTypeDefaultDescription
base_streamStreamrequiredStream to wrap
d_maxOptional[int]NoneFeature dimension (defaults to original)
missing_ratiofloat0.5Probability that each feature is missing per time step
total_instancesint10000Stream length
min_featuresint1Guaranteed minimum features per instance
random_seedint42RNG seed
class EvolvableStream · Inheritance: Stream

Divides features into n_segments sequential partitions. Features transition from one segment to the next with configurable overlap windows. Inactive features are filled with np.nan.

EvolvableStream(
    base_stream: Stream,
    d_max: Optional[int] = None,
    n_segments: int = 2,
    overlap_ratio: float = 1.0,
    total_instances: int = 10000,
    random_seed: int = 42,
) -> None
ParameterTypeDefaultDescription
base_streamStreamrequiredStream to wrap
d_maxOptional[int]NoneFeature dimension
n_segmentsint2Number of sequential feature partitions (≥ 2)
overlap_ratiofloat1.0Overlap window length relative to stable period
total_instancesint10000Stream length
random_seedint42RNG seed
class ShuffledStream · Inheritance: Stream

Buffers the entire base stream into memory and serves instances in a randomly shuffled order.

Warning: Loads the full dataset into memory. Suitable for MB-scale datasets; use caution with GB-scale data.
ShuffledStream(base_stream: Stream, random_seed: int = 42) -> None
Method / AttributeReturnsDescription
n_instancesintTotal buffered instances
get_num_instances()intTotal buffered instances
has_more_instances()boolTrue if pointer not exhausted
next_instance()_AnyInstanceNext shuffled instance
restart()NoneRe-shuffles and resets pointer

§5   Synthetic Stream Generators

Module: openmoa.stream.generator. All generators inherit from MOAStream.

class SEA · Binary classification · 3 numeric features

Classic SEA (Stream Ensemble Algorithm) binary classification with three numeric features.

SEA(function: int = 1, instance_random_seed: int = 1, noise_percentage: int = 10)
ParameterTypeDefaultDescription
functionint1Concept function (1, 2, 3, or 4)
instance_random_seedint1RNG seed for instance generation
noise_percentageint10Percentage of noisy labels
class RandomTreeGenerator

Generates instances from a random decision tree concept.

RandomTreeGenerator(
    instance_random_seed: int = 1, tree_random_seed: int = 1,
    num_classes: int = 2, num_nominals: int = 5, num_numerics: int = 5,
    num_vals_per_nominal: int = 5, max_tree_depth: int = 5,
    first_leaf_level: int = 3, leaf_fraction: float = 0.15,
)
ParameterDefaultDescription
instance_random_seed1RNG seed for instances
tree_random_seed1RNG seed for tree structure
num_classes2Number of classes
num_nominals5Number of nominal attributes
num_numerics5Number of numeric attributes
num_vals_per_nominal5Possible values per nominal attribute
max_tree_depth5Maximum tree depth
first_leaf_level3Level at which leaves start appearing
leaf_fraction0.15Fraction of internal nodes converted to leaves
class RandomRBFGenerator

Generates instances from randomly placed RBF (Radial Basis Function) centroids.

RandomRBFGenerator(
    model_random_seed: int = 1, instance_random_seed: int = 1,
    number_of_classes: int = 2, number_of_attributes: int = 10,
    number_of_centroids: int = 50,
)
class RandomRBFGeneratorDrift · Extends RandomRBFGenerator with drifting centroids

RBF generator where centroids drift over time — simulates continuous concept drift.

RandomRBFGeneratorDrift(
    model_random_seed: int = 1, instance_random_seed: int = 1,
    number_of_classes: int = 2, number_of_attributes: int = 10,
    number_of_centroids: int = 50,
    number_of_drifting_centroids: int = 2,
    magnitude_of_change: float = 0.0,
)
ParameterDefaultDescription
number_of_drifting_centroids2Number of centroids that drift
magnitude_of_change0.0Speed of centroid movement
class LEDGenerator · 7-segment display digit recognition
LEDGenerator(instance_random_seed: int = 1, noise_percentage: int = 10, reduce_data: bool = False)
class LEDGeneratorDrift

LED generator with a configurable number of drifting attributes.

LEDGeneratorDrift(
    instance_random_seed: int = 1, noise_percentage: int = 10,
    reduce_data: bool = False, number_of_attributes_with_drift: int = 7,
)
class WaveformGenerator · 3-class · 21 attributes
WaveformGenerator(instance_random_seed: int = 1, noise: bool = False)
class WaveformGeneratorDrift
WaveformGeneratorDrift(
    instance_random_seed: int = 1, noise: bool = False,
    number_of_attributes_with_drift: int = 10,
)
class AgrawalGenerator · Loan classification
AgrawalGenerator(
    instance_random_seed: int = 1, function: int = 1,
    balance_classes: bool = False, peturbation: float = 0.05,
)
class HyperplaneGenerator · Rotating hyperplane · continuous gradual drift
HyperplaneGenerator(
    instance_random_seed: int = 1,
    number_of_attributes: int = 10,
    number_of_drifting_attributes: int = 2,
    magnitude_of_change: float = 0.0,
    noise_percentage: int = 5,
    sigma_percentage: int = 10,
)
class STAGGERGenerator · Binary classification
STAGGERGenerator(instance_random_seed: int = 1, function: int = 1, balance_classes: bool = False)
class MixedGenerator · Numeric + nominal features
MixedGenerator(instance_random_seed: int = 1, function: int = 1, balance_classes: bool = False)

§6   Drift Streams

Module: openmoa.stream.drift

class Drift · Base class for drift transitions
Drift(position: int, width: int = 0, alpha: float = 0.0, random_seed: int = 1)
ParameterTypeDefaultDescription
positionintrequiredInstance index at which drift occurs
widthint0Transition window size (0 or 1 = abrupt)
alphafloat0.0Grade of change
random_seedint1RNG seed for the transition
class AbruptDrift · Inheritance: Drift

Instantaneous concept switch at a specific instance.

AbruptDrift(position: int, random_seed: int = 1)
class GradualDrift · Inheritance: Drift

Gradual transition where instances are probabilistically drawn from either the old or new concept.

# Specify by center + width
GradualDrift(position=10000, width=2000, random_seed=1)

# Or specify by start + end
GradualDrift(start=9000, end=11000, random_seed=1)
ParameterTypeDefaultDescription
positionOptional[int]NoneCenter of the drift window
widthOptional[int]NoneLength of transition window
startOptional[int]NoneStart of transition window
endOptional[int]NoneEnd of transition window
alphafloat0.0Grade of change
random_seedint1RNG seed
Note: Either (position + width) or (start + end) must be provided. Internally: width = end − start, position = (start + end) / 2.
class DriftStream · Inheritance: MOAStream

Composes a list of sub-streams connected by Drift objects into a single stream with concept drift.

DriftStream(
    schema: Optional[Schema] = None,
    CLI: Optional[str] = None,
    moa_stream: Optional[InstanceStream] = None,
    stream: Optional[list] = None,
)

The stream parameter takes an alternating list: [Stream, Drift, Stream, Drift, Stream, …]

Methods
MethodReturnsDescription
get_num_drifts()intNumber of drift transitions
get_drifts()list[Drift]List of Drift objects with their positions and widths
example
from openmoa.stream.drift import DriftStream, AbruptDrift, GradualDrift
from openmoa.stream.generator import SEA

stream = DriftStream(stream=[
    SEA(function=1),
    AbruptDrift(position=5000),
    SEA(function=2),
    GradualDrift(position=10000, width=2000),
    SEA(function=3),
])
print(stream.get_num_drifts())   # 2
class RecurrentConceptDriftStream

Generates recurrent / periodic concepts from a list of concepts that cycle through multiple times.

RecurrentConceptDriftStream(
    concept_list: Sequence[Stream],
    max_recurrences_per_concept: int = 2,
    transition_type_template: Drift = AbruptDrift(position=2000),
    concept_name_list: Optional[Sequence[str]] = None,
)
ParameterTypeDefaultDescription
concept_listSequence[Stream]requiredList of concepts to cycle through
max_recurrences_per_conceptint2How many times each concept reappears
transition_type_templateDriftAbruptDrift(2000)Template for transitions between concepts
concept_name_listOptional[Sequence[str]]NoneNames for each concept

§7   Built-in Datasets

Module: openmoa.datasets. All datasets auto-download on first use and are stored locally. All inherit from DownloadARFFGzip and expose the standard Stream interface.

Classification Datasets
ClassInstancesAttributesClassesDescription
Electricity45,31282Electricity demand (UP/DOWN)
ElectricityTiny2,00082Tiny version for testing
Covtype581,012547Forest cover type
CovtypeNorm581,012547Covtype with normalized features
CovtypeTiny1,001547Tiny version for testing
CovtFD581,0111047Covtype with 2 synthetic feature drifts at instances 193,669 and 387,338
RBFm_100k100,000105Synthetic RBF
RTG_2abrupt100,000305Random Tree with 2 abrupt drifts
Hyper100k100,000102Hyperplane
Sensor2,219,803554Indoor sensor readings
RCV120,242~47,236 (sparse)2Text classification
W8a49,7493002Web page classification
Adult32,5611232Census income
Magic0419,020102MAGIC gamma telescope
Spambase4,601572Email spam
Musk6,5981662Musk molecules
SVMGuide31,243212SVM benchmark
German1,000242Credit risk
Australian690142Credit approval
Ionosphere351342Radar returns
InternetAds2,3591,5582Internet advertisements
DryBean13,611167Dry bean classification
Optdigits5,6206410Optical digit recognition
Frogs7,195224Frog species
Wine178133Wine cultivars
Splice3,190603DNA splice junctions
SeagateBinary49,999942Seagate binary
SeagateMulti11,8009411Seagate multi-class
Regression Datasets
ClassInstancesAttributesDescription
Fried40,76810Friedman regression
FriedTiny1,00010Tiny version for testing
Bike17,37912Bike sharing demand
usage example
from openmoa.datasets import Electricity, Fried

stream = Electricity()
print(stream.get_schema().get_num_classes())     # 2
print(stream.get_schema().get_num_attributes())  # 8

reg_stream = Fried()
print(reg_stream.get_schema().is_regression())   # True