API Reference
Complete Reference
openmoa.streams
Complete API reference for all stream classes, instance types, schema, and stream wrappers in OpenMOA.
Covers every public class, method, parameter, and attribute.
§0 Module Exports
from openmoa.stream import (
Stream, MOAStream,
ARFFStream, NumpyStream, CSVStream, ConcatStream,
LibsvmStream, BagOfWordsStream,
OpenFeatureStream, EvolvingFeatureStream, # EvolvingFeatureStream is an alias
TrapezoidalStream, CapriciousStream, EvolvableStream,
ShuffledStream,
stream_from_file,
)
from openmoa.stream.drift import (
DriftStream, Drift, AbruptDrift, GradualDrift,
RecurrentConceptDriftStream,
)
from openmoa.stream.generator import (
SEA, RandomTreeGenerator,
RandomRBFGenerator, RandomRBFGeneratorDrift,
LEDGenerator, LEDGeneratorDrift,
WaveformGenerator, WaveformGeneratorDrift,
AgrawalGenerator, HyperplaneGenerator,
STAGGERGenerator, MixedGenerator,
)
from openmoa.datasets import (
Electricity, ElectricityTiny,
Covtype, CovtypeNorm, CovtypeTiny, CovtFD,
RBFm_100k, RTG_2abrupt, Hyper100k, Sensor,
Fried, FriedTiny, Bike,
# ... full list in §7
)
§1 Core Data Structures
Describes the structure of a stream: attribute names, data types, number of classes, and label values. Required by all learners and evaluators.
Schema(moa_header)
signature
Schema(moa_header: InstancesHeader) -> None
| Parameter | Type | Description |
| moa_header | InstancesHeader | Java MOA header object. Typically not called directly — use Schema.from_custom() or stream.get_schema(). |
Schema.from_custom (class method)
signature
Schema.from_custom(
feature_names: Sequence[str],
values_for_nominal_features: Dict[str, Sequence[str]] = {},
values_for_class_label: Sequence[str] = None,
dataset_name: str = "No_Name",
target_attribute_name: Optional[str] = None,
target_type: Optional[str] = None,
) -> Schema
| Parameter | Type | Default | Description |
| feature_names | Sequence[str] | required | List of feature attribute names |
| values_for_nominal_features | Dict[str, Sequence[str]] | {} | Maps feature name → list of possible values for nominal features |
| values_for_class_label | Sequence[str] | None | Possible class label strings; if None → regression schema |
| dataset_name | str | "No_Name" | Name of the dataset |
| target_attribute_name | Optional[str] | None | Name of the target/class attribute |
| target_type | Optional[str] | None | 'categorical', 'numeric', or None (auto-detect) |
Task type methods
| Method | Returns | Description |
| is_classification() | bool | True if classification task |
| is_regression() | bool | True if regression task |
Attribute info methods
| Method | Returns | Description |
| get_num_attributes() | int | Number of input features (excluding target) |
| get_num_numeric_attributes() | int | Count of numeric attributes |
| get_num_nominal_attributes() | int | Count of nominal (categorical) attributes |
| get_numeric_attributes() | list | None | List of numeric attribute names |
| get_nominal_attributes() | dict | None | Dict of {name: [values]} for nominal attributes |
Class / label info methods (classification only)
| Method | Returns | Description |
| get_num_classes() | int | Number of possible classes (1 for regression) |
| get_label_values() | Sequence[str] | List of possible class label strings |
| get_label_indexes() | Sequence[int] | List of class indices [0, 1, ..., n-1] |
| get_value_for_index(y_index) | Optional[str] | Class label string for index; None if y_index is None |
| get_index_for_label(y) | int | Class index for label string; raises KeyError if not found |
| is_y_index_in_range(y_index) | bool | Whether y_index is valid for this schema |
MOA access & special methods
| Method / Property | Returns | Description |
| get_moa_header() | InstancesHeader | Underlying Java MOA header (advanced use) |
| dataset_name | str | Property — name of the dataset |
| __str__() / __repr__() | str | Returns ARFF header representation |
| __eq__(other) | bool | Compares number of attributes and classes |
Base class representing a single data point with a feature vector and schema reference.
signature
Instance(schema: Schema, instance: Union[InstanceExample, FeatureVector]) -> None
Instance.from_array (class method)
Instance.from_array(schema: Schema, instance: FeatureVector) -> Instance
Creates an Instance from a NumPy feature array (no label).
Properties
| Property | Type | Description |
| x | NDArray[float64] | Feature vector as 1D NumPy array |
| schema | Schema | The stream schema |
| java_instance | InstanceExample | Java representation |
Instance with a class label for classification tasks.
LabeledInstance.from_array (class method)
LabeledInstance.from_array(schema: Schema, x: FeatureVector, y_index: int) -> LabeledInstance
| Parameter | Type | Description |
| schema | Schema | Classification schema |
| x | NDArray[float64] | Feature vector |
| y_index | int | Class index (0-based) |
Properties
| Property | Type | Description |
| x | NDArray[float64] | Feature vector |
| y_index | int | Class index (0-based integer) |
| y_label | str | Class label string (via schema.get_value_for_index) |
| schema | Schema | Stream schema |
Instance with a continuous target value for regression tasks.
RegressionInstance.from_array (class method)
RegressionInstance.from_array(schema: Schema, x: FeatureVector, y_value: float) -> RegressionInstance
Properties
| Property | Type | Description |
| x | NDArray[float64] | Feature vector |
| y_value | float | Continuous target value |
| schema | Schema | Stream schema |
| Alias | Underlying Type | Description |
| FeatureVector | NDArray[float64] | 1D NumPy float64 array of feature values |
| LabelIndex | int | Non-negative class index integer |
| Label | str | Class label string |
| LabelProbabilities | NDArray[float64] | 1D array of prediction probabilities |
| TargetValue | float | Continuous target value for regression |
§2 Stream Base Classes
Abstract base class for all streams. Implements the Python iterator protocol. All subclasses must implement the four abstract methods below.
Abstract methods
| Method | Signature | Description |
| has_more_instances() | () → bool | True if stream has more instances |
| next_instance() | () → _AnyInstance | Returns the next instance |
| get_schema() | () → Schema | Returns the stream schema |
| restart() | () → None | Resets the stream to the beginning |
Concrete methods
| Method | Returns | Description |
| __iter__() | Iterator | Returns self; does NOT restart the stream |
| __next__() | _AnyInstance | Returns next instance; raises StopIteration if exhausted |
| get_moa_stream() | Optional[InstanceStream] | Returns underlying Java MOA stream, or None |
| CLI_help() | str | Returns MOA option documentation (if MOA stream available) |
| __str__() | str | Returns dataset name |
Wraps any MOA Java stream. Used internally by all built-in generators and dataset streams.
signature
MOAStream(
moa_stream: Optional[InstanceStream] = None,
schema: Optional[Schema] = None,
CLI: Optional[str] = None,
) -> None
| Parameter | Type | Default | Description |
| moa_stream | Optional[InstanceStream] | None | MOA stream Java object |
| schema | Optional[Schema] | None | Schema; inferred from moa_stream if None |
| CLI | Optional[str] | None | Additional MOA CLI arguments |
Raises: ValueError if no schema and no moa_stream; ValueError if CLI provided without moa_stream.
§3 File-Based Streams
Reads a stream from an ARFF file (Attribute-Relation File Format).
ARFFStream(
path: Union[str, Path],
CLI: Optional[str] = None,
class_index: int = -1,
) -> None
| Parameter | Type | Default | Description |
| path | str | Path | required | Path to .arff file |
| CLI | Optional[str] | None | Additional MOA CLI arguments |
| class_index | int | -1 | Index of class column (-1 = last column) |
from openmoa.stream import ARFFStream
stream = ARFFStream("data/covtype.arff")
instance = stream.next_instance()
print(instance.x) # feature vector
print(instance.y_index) # class index
Creates a stream directly from NumPy arrays. Useful for integrating existing datasets.
NumpyStream(
X: np.ndarray,
y: np.ndarray,
dataset_name: str = "No_Name",
feature_names: Optional[Sequence[str]] = None,
target_name: Optional[str] = None,
target_type: Optional[str] = None,
) -> None
| Parameter | Type | Default | Description |
| X | np.ndarray | required | Feature matrix, shape (n_samples, n_features) |
| y | np.ndarray | required | Target vector, shape (n_samples,) |
| dataset_name | str | "No_Name" | Name of the dataset |
| feature_names | Optional[Sequence[str]] | None | Feature names; auto-generated as attrib_0, attrib_1, … if None |
| target_name | Optional[str] | None | Name of the target attribute |
| target_type | Optional[str] | None | 'categorical', 'numeric', or None (auto-detect) |
Attributes & Methods
| Name | Type / Returns | Description |
| current_instance_index | int | Current position in the array |
| has_more_instances() | bool | True if current_instance_index < len(X) |
| next_instance() | LabeledInstance | RegressionInstance | Next instance from arrays |
| restart() | None | Resets current_instance_index to 0 |
| __len__() | int | Total number of instances |
Reads a stream from a CSV file line by line.
CSVStream(
csv_file_path: str,
dtypes: Optional[list] = None,
values_for_nominal_features: Dict = {},
class_index: int = -1,
values_for_class_label: Optional[list] = None,
target_attribute_name: Optional[str] = None,
target_type: Optional[str] = None,
skip_header: bool = False,
delimiter: str = ",",
dataset_name: Optional[str] = None,
) -> None
| Parameter | Type | Default | Description |
| csv_file_path | str | required | Path to CSV file |
| dtypes | Optional[list] | None | List of (column_name, dtype) tuples; auto-inferred if None |
| values_for_nominal_features | Dict | {} | Maps column index → list of possible nominal values |
| class_index | int | -1 | Index of class/target column (-1 = last) |
| values_for_class_label | Optional[list] | None | Possible class values; auto-detected if None |
| target_attribute_name | Optional[str] | None | Name of target attribute |
| target_type | Optional[str] | None | 'categorical', 'numeric', or None |
| skip_header | bool | False | Skip the first line |
| delimiter | str | "," | Field delimiter character |
| dataset_name | Optional[str] | None | Defaults to "CSVStream({path})" |
Attributes
| Attribute | Type | Description |
| csv_file_path | str | Path to file |
| total_number_of_lines | int | Total lines in file (set at init) |
Reads sparse data in LIBSVM format (label feat_id:value feat_id:value …).
LibsvmStream(
path: Union[str, Path],
dataset_name: str = "LibsvmDataset",
target_type: str = "categorical",
) -> None
| Parameter | Type | Default | Description |
| path | str | Path | required | Path to LIBSVM file |
| dataset_name | str | "LibsvmDataset" | Dataset name |
| target_type | str | "categorical" | 'categorical' or 'numeric' |
Note: Instances have a _sparse_x attribute (dict {feature_id: value}) alongside the standard x array. Raises FileNotFoundError if file does not exist.
| Method | Returns | Description |
| has_more_instances() | bool | True if more lines available |
| next_instance() | LabeledInstance | RegressionInstance | Next sparse instance |
| restart() | None | Resets position and clears cache |
| __len__() | int | Total number of instances |
Reads text data from bag-of-words .review files for binary classification (positive vs. negative).
BagOfWordsStream(
positive_file: Path,
negative_file: Path,
dataset_name: str = "BagOfWords",
normalize: bool = True,
shuffle_seed: Optional[int] = None,
) -> None
| Parameter | Type | Default | Description |
| positive_file | Path | required | File containing positive examples |
| negative_file | Path | required | File containing negative examples |
| dataset_name | str | "BagOfWords" | Dataset name |
| normalize | bool | True | Normalize feature vectors to unit length |
| shuffle_seed | Optional[int] | None | Shuffle seed; None = no shuffle |
Note: Instances have a _sparse_x attribute (dict {word: count}). Class 0 = negative, class 1 = positive.
Concatenates multiple streams into one, switching to the next stream when the current one is exhausted.
ConcatStream(streams: Sequence[Stream]) -> None
Raises: ValueError if schemas are not equal across streams.
| Method | Returns | Description |
| has_more_instances() | bool | True if any remaining stream has instances |
| next_instance() | _AnyInstance | Next instance; advances to next sub-stream when exhausted |
| get_schema() | Schema | Schema of the current stream |
| restart() | None | Restarts all sub-streams and resets index |
| __len__() | int | Total length (only if all sub-streams support len()) |
Auto-detects file type and returns the appropriate stream object (ARFFStream for .arff, CSVStream for .csv).
stream_from_file(
path_to_csv_or_arff: Union[str, Path],
dataset_name: str = "NoName",
class_index: int = -1,
target_type: Optional[str] = None,
) -> Stream
| Parameter | Type | Default | Description |
| path_to_csv_or_arff | str | Path | required | Path to .arff or .csv file |
| dataset_name | str | "NoName" | Dataset name |
| class_index | int | -1 | Class column index |
| target_type | Optional[str] | None | 'categorical', 'numeric', or None (CSV only) |
Raises: FileNotFoundError · IsADirectoryError · ValueError (unsupported extension).
§4 Feature Evolution Wrappers
These wrappers simulate dynamic feature spaces where the set of active features changes over time.
Wraps a stream to shrink/grow the active feature set over time. Each returned instance carries a feature_indices NumPy attribute indicating which original features are active.
OpenFeatureStream(
base_stream: Stream,
d_min: int = 2,
d_max: Optional[int] = None,
evolution_pattern: Literal["pyramid","incremental","decremental","tds","cds","eds"] = "pyramid",
total_instances: int = 10000,
feature_selection: Literal["prefix","suffix","random"] = "prefix",
missing_ratio: float = 0.0,
random_seed: int = 42,
tds_mode: Literal["random","ordered"] = "random",
n_segments: int = 2,
overlap_ratio: float = 1.0,
) -> None
| Parameter | Type | Default | Description |
| base_stream | Stream | required | Stream to wrap |
| d_min | int | 2 | Minimum number of active features |
| d_max | Optional[int] | None | Maximum features; defaults to original feature count |
| evolution_pattern | str | "pyramid" | Pattern of feature evolution (see table below) |
| total_instances | int | 10000 | Total stream length |
| feature_selection | str | "prefix" | Which features to keep when dimension is reduced |
| missing_ratio | float | 0.0 | Per-feature absence probability (only used by "cds" pattern) |
| random_seed | int | 42 | RNG seed for reproducibility |
| tds_mode | str | "random" | "random" or "ordered" birth assignment (only used by "tds") |
| n_segments | int | 2 | Number of sequential partitions (only used by "eds") |
| overlap_ratio | float | 1.0 | Overlap length relative to stable period (only used by "eds") |
Evolution patterns
| Value | Description |
| "pyramid" | Feature count grows linearly from d_min to d_max, then shrinks back to d_min |
| "incremental" | Monotonic growth from d_min to d_max |
| "decremental" | Monotonic shrinkage from d_max to d_min |
| "tds" | Trapezoidal: each feature has an independent birth time assigned across 10 stages |
| "cds" | Capricious: each feature is independently present at each step with probability 1 − missing_ratio |
| "eds" | Evolvable: n_segments sequential partitions with configurable overlap windows |
Feature selection modes
| Value | Description |
| "prefix" | Keep the first d features from the active set |
| "suffix" | Keep the last d features |
| "random" | Randomly select d features (reproducible per time step via seed) |
Similar to OpenFeatureStream but keeps the vector fixed-size: inactive features are filled with np.nan rather than omitted.
TrapezoidalStream(
base_stream: Stream,
d_min: int = 2,
d_max: Optional[int] = None,
evolution_mode: Literal["random","ordered","pyramid"] = "random",
total_instances: int = 10000,
random_seed: int = 42,
) -> None
| Parameter | Type | Default | Description |
| base_stream | Stream | required | Stream to wrap |
| d_min | int | 2 | Minimum active features |
| d_max | Optional[int] | None | Maximum features (defaults to original count) |
| evolution_mode | str | "random" | "random": random order; "ordered": index order; "pyramid": grow then shrink |
| total_instances | int | 10000 | Stream length |
| random_seed | int | 42 | RNG seed |
Each feature is independently and randomly absent at each time step with probability missing_ratio. Inactive features are filled with np.nan.
CapriciousStream(
base_stream: Stream,
d_max: Optional[int] = None,
missing_ratio: float = 0.5,
total_instances: int = 10000,
min_features: int = 1,
random_seed: int = 42,
) -> None
| Parameter | Type | Default | Description |
| base_stream | Stream | required | Stream to wrap |
| d_max | Optional[int] | None | Feature dimension (defaults to original) |
| missing_ratio | float | 0.5 | Probability that each feature is missing per time step |
| total_instances | int | 10000 | Stream length |
| min_features | int | 1 | Guaranteed minimum features per instance |
| random_seed | int | 42 | RNG seed |
Divides features into n_segments sequential partitions. Features transition from one segment to the next with configurable overlap windows. Inactive features are filled with np.nan.
EvolvableStream(
base_stream: Stream,
d_max: Optional[int] = None,
n_segments: int = 2,
overlap_ratio: float = 1.0,
total_instances: int = 10000,
random_seed: int = 42,
) -> None
| Parameter | Type | Default | Description |
| base_stream | Stream | required | Stream to wrap |
| d_max | Optional[int] | None | Feature dimension |
| n_segments | int | 2 | Number of sequential feature partitions (≥ 2) |
| overlap_ratio | float | 1.0 | Overlap window length relative to stable period |
| total_instances | int | 10000 | Stream length |
| random_seed | int | 42 | RNG seed |
Buffers the entire base stream into memory and serves instances in a randomly shuffled order.
Warning: Loads the full dataset into memory. Suitable for MB-scale datasets; use caution with GB-scale data.
ShuffledStream(base_stream: Stream, random_seed: int = 42) -> None
| Method / Attribute | Returns | Description |
| n_instances | int | Total buffered instances |
| get_num_instances() | int | Total buffered instances |
| has_more_instances() | bool | True if pointer not exhausted |
| next_instance() | _AnyInstance | Next shuffled instance |
| restart() | None | Re-shuffles and resets pointer |
§5 Synthetic Stream Generators
Module: openmoa.stream.generator. All generators inherit from MOAStream.
Classic SEA (Stream Ensemble Algorithm) binary classification with three numeric features.
SEA(function: int = 1, instance_random_seed: int = 1, noise_percentage: int = 10)
| Parameter | Type | Default | Description |
| function | int | 1 | Concept function (1, 2, 3, or 4) |
| instance_random_seed | int | 1 | RNG seed for instance generation |
| noise_percentage | int | 10 | Percentage of noisy labels |
Generates instances from a random decision tree concept.
RandomTreeGenerator(
instance_random_seed: int = 1, tree_random_seed: int = 1,
num_classes: int = 2, num_nominals: int = 5, num_numerics: int = 5,
num_vals_per_nominal: int = 5, max_tree_depth: int = 5,
first_leaf_level: int = 3, leaf_fraction: float = 0.15,
)
| Parameter | Default | Description |
| instance_random_seed | 1 | RNG seed for instances |
| tree_random_seed | 1 | RNG seed for tree structure |
| num_classes | 2 | Number of classes |
| num_nominals | 5 | Number of nominal attributes |
| num_numerics | 5 | Number of numeric attributes |
| num_vals_per_nominal | 5 | Possible values per nominal attribute |
| max_tree_depth | 5 | Maximum tree depth |
| first_leaf_level | 3 | Level at which leaves start appearing |
| leaf_fraction | 0.15 | Fraction of internal nodes converted to leaves |
Generates instances from randomly placed RBF (Radial Basis Function) centroids.
RandomRBFGenerator(
model_random_seed: int = 1, instance_random_seed: int = 1,
number_of_classes: int = 2, number_of_attributes: int = 10,
number_of_centroids: int = 50,
)
RBF generator where centroids drift over time — simulates continuous concept drift.
RandomRBFGeneratorDrift(
model_random_seed: int = 1, instance_random_seed: int = 1,
number_of_classes: int = 2, number_of_attributes: int = 10,
number_of_centroids: int = 50,
number_of_drifting_centroids: int = 2,
magnitude_of_change: float = 0.0,
)
| Parameter | Default | Description |
| number_of_drifting_centroids | 2 | Number of centroids that drift |
| magnitude_of_change | 0.0 | Speed of centroid movement |
LEDGenerator(instance_random_seed: int = 1, noise_percentage: int = 10, reduce_data: bool = False)
LED generator with a configurable number of drifting attributes.
LEDGeneratorDrift(
instance_random_seed: int = 1, noise_percentage: int = 10,
reduce_data: bool = False, number_of_attributes_with_drift: int = 7,
)
AgrawalGenerator(
instance_random_seed: int = 1, function: int = 1,
balance_classes: bool = False, peturbation: float = 0.05,
)
HyperplaneGenerator(
instance_random_seed: int = 1,
number_of_attributes: int = 10,
number_of_drifting_attributes: int = 2,
magnitude_of_change: float = 0.0,
noise_percentage: int = 5,
sigma_percentage: int = 10,
)
STAGGERGenerator(instance_random_seed: int = 1, function: int = 1, balance_classes: bool = False)
MixedGenerator(instance_random_seed: int = 1, function: int = 1, balance_classes: bool = False)
§6 Drift Streams
Module: openmoa.stream.drift
Drift(position: int, width: int = 0, alpha: float = 0.0, random_seed: int = 1)
| Parameter | Type | Default | Description |
| position | int | required | Instance index at which drift occurs |
| width | int | 0 | Transition window size (0 or 1 = abrupt) |
| alpha | float | 0.0 | Grade of change |
| random_seed | int | 1 | RNG seed for the transition |
Instantaneous concept switch at a specific instance.
AbruptDrift(position: int, random_seed: int = 1)
Gradual transition where instances are probabilistically drawn from either the old or new concept.
# Specify by center + width
GradualDrift(position=10000, width=2000, random_seed=1)
# Or specify by start + end
GradualDrift(start=9000, end=11000, random_seed=1)
| Parameter | Type | Default | Description |
| position | Optional[int] | None | Center of the drift window |
| width | Optional[int] | None | Length of transition window |
| start | Optional[int] | None | Start of transition window |
| end | Optional[int] | None | End of transition window |
| alpha | float | 0.0 | Grade of change |
| random_seed | int | 1 | RNG seed |
Note: Either (position + width) or (start + end) must be provided. Internally: width = end − start, position = (start + end) / 2.
Composes a list of sub-streams connected by Drift objects into a single stream with concept drift.
DriftStream(
schema: Optional[Schema] = None,
CLI: Optional[str] = None,
moa_stream: Optional[InstanceStream] = None,
stream: Optional[list] = None,
)
The stream parameter takes an alternating list: [Stream, Drift, Stream, Drift, Stream, …]
Methods
| Method | Returns | Description |
| get_num_drifts() | int | Number of drift transitions |
| get_drifts() | list[Drift] | List of Drift objects with their positions and widths |
from openmoa.stream.drift import DriftStream, AbruptDrift, GradualDrift
from openmoa.stream.generator import SEA
stream = DriftStream(stream=[
SEA(function=1),
AbruptDrift(position=5000),
SEA(function=2),
GradualDrift(position=10000, width=2000),
SEA(function=3),
])
print(stream.get_num_drifts()) # 2
Generates recurrent / periodic concepts from a list of concepts that cycle through multiple times.
RecurrentConceptDriftStream(
concept_list: Sequence[Stream],
max_recurrences_per_concept: int = 2,
transition_type_template: Drift = AbruptDrift(position=2000),
concept_name_list: Optional[Sequence[str]] = None,
)
| Parameter | Type | Default | Description |
| concept_list | Sequence[Stream] | required | List of concepts to cycle through |
| max_recurrences_per_concept | int | 2 | How many times each concept reappears |
| transition_type_template | Drift | AbruptDrift(2000) | Template for transitions between concepts |
| concept_name_list | Optional[Sequence[str]] | None | Names for each concept |
§7 Built-in Datasets
Module: openmoa.datasets. All datasets auto-download on first use and are stored locally. All inherit from DownloadARFFGzip and expose the standard Stream interface.
Classification Datasets
| Class | Instances | Attributes | Classes | Description |
| Electricity | 45,312 | 8 | 2 | Electricity demand (UP/DOWN) |
| ElectricityTiny | 2,000 | 8 | 2 | Tiny version for testing |
| Covtype | 581,012 | 54 | 7 | Forest cover type |
| CovtypeNorm | 581,012 | 54 | 7 | Covtype with normalized features |
| CovtypeTiny | 1,001 | 54 | 7 | Tiny version for testing |
| CovtFD | 581,011 | 104 | 7 | Covtype with 2 synthetic feature drifts at instances 193,669 and 387,338 |
| RBFm_100k | 100,000 | 10 | 5 | Synthetic RBF |
| RTG_2abrupt | 100,000 | 30 | 5 | Random Tree with 2 abrupt drifts |
| Hyper100k | 100,000 | 10 | 2 | Hyperplane |
| Sensor | 2,219,803 | 5 | 54 | Indoor sensor readings |
| RCV1 | 20,242 | ~47,236 (sparse) | 2 | Text classification |
| W8a | 49,749 | 300 | 2 | Web page classification |
| Adult | 32,561 | 123 | 2 | Census income |
| Magic04 | 19,020 | 10 | 2 | MAGIC gamma telescope |
| Spambase | 4,601 | 57 | 2 | Email spam |
| Musk | 6,598 | 166 | 2 | Musk molecules |
| SVMGuide3 | 1,243 | 21 | 2 | SVM benchmark |
| German | 1,000 | 24 | 2 | Credit risk |
| Australian | 690 | 14 | 2 | Credit approval |
| Ionosphere | 351 | 34 | 2 | Radar returns |
| InternetAds | 2,359 | 1,558 | 2 | Internet advertisements |
| DryBean | 13,611 | 16 | 7 | Dry bean classification |
| Optdigits | 5,620 | 64 | 10 | Optical digit recognition |
| Frogs | 7,195 | 22 | 4 | Frog species |
| Wine | 178 | 13 | 3 | Wine cultivars |
| Splice | 3,190 | 60 | 3 | DNA splice junctions |
| SeagateBinary | 49,999 | 94 | 2 | Seagate binary |
| SeagateMulti | 11,800 | 94 | 11 | Seagate multi-class |
Regression Datasets
| Class | Instances | Attributes | Description |
| Fried | 40,768 | 10 | Friedman regression |
| FriedTiny | 1,000 | 10 | Tiny version for testing |
| Bike | 17,379 | 12 | Bike sharing demand |
from openmoa.datasets import Electricity, Fried
stream = Electricity()
print(stream.get_schema().get_num_classes()) # 2
print(stream.get_schema().get_num_attributes()) # 8
reg_stream = Fried()
print(reg_stream.get_schema().is_regression()) # True