Name		Name	Last commit message	Last commit date
parent directory ..
ts_datasets		ts_datasets
README.md		README.md
setup.py		setup.py

README.md

ts_datasets

This library implements Python classes that manipulate numerous time series datasets into standardized pandas DataFrames. The sub-modules are ts_datasets.anomaly for time series anomaly detection, and ts_datasets.forecast for time series forecasting. Simply install the package by calling pip install -e . from the command line. Then, you can load a dataset (e.g. the "realAWSCloudwatch" split of the Numenta Anomaly Benchmark) by calling

from ts_datasets.anomaly import NAB
dataset = NAB(subset="realAWSCloudwatch", rootdir=path_to_NAB)

Note that if you have installed this package in editable mode (i.e. by specifying -e), the root directory need not be specified.

Each dataset supports the following features:

__getitem__: you may call ts, metadata = dataset[i]. ts is a time-indexed pandas DataFrame, with each column representing a different variable (in the case of multivariate time series). metadata is a dict or pd.DataFrame with the same index as ts, with different keys indicating different dataset-specific metadata (train/test split, anomaly labels, etc.) for each timestamp.
__len__: Calling len(dataset) will return the number of time series in the dataset.
__iter__: You may iterate over the pandas representations of the time series in the dataset with for ts, metadata in dataset: ...

For each time series in the dataset, metadata is a dict or pd.DataFrame that will always have the following keys:

trainval: (bool) a pd.Series indicating whether each timestamp of the time series should be used for training/validation (if True) or testing (if False)

For anomaly detection datasets, metadata will also have the key:

anomaly: (bool) a pd.Series indicating whether each timestamp is anomalous

We currently support the following datasets for time series anomaly detection (ts_datasets.anomaly):

IOps Competition
Numenta Anomaly Benchmark
Synthetic (synthetic data generated using this script)
SMAP & MSL (multivariate time series anomaly detection datasets from NASA)
SMD (server machine dataset)

We currently support the following datasets for time series forecasting (ts_datasets.forecast):

M4 Competition
- There are 100,000 univariate time series with different granularity, including Yearly (23,000 sequences), Quarterly (24,000 sequences), Monthly (48,000 sequences), Weekly (359 sequences), Daily (4,227 sequences) and Hourly (414 sequences) data.
Energy Power Grid
- There is one 10-variable time series.
- Each univariate records the energy power usage in a particular region.
Seattle Trail for Bike and Pedestrian
- There is one 5-variable time series.
- Each univariate records the bicycle/pedestrian flow along a different direction on the trail
Solar Energy Plant
- There is one 405-variable time series.
- Each univariate records the solar energy power in each detector in the plant
- By default, the data loader returns only the first 100 of 405 univariates

More details on each dataset can be found in their class-level docstrings, or in the API doc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ts_datasets

ts_datasets

README.md

ts_datasets

Files

ts_datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

ts_datasets

Folders and files

parent directory

README.md

ts_datasets