Skip to content

Latest commit

 

History

History
147 lines (112 loc) · 5.82 KB

usage.md

File metadata and controls

147 lines (112 loc) · 5.82 KB

Usage

From CLI

To generate time series from CLI, the user must create a config YAML file that defines all base-oscillations and their contained anomalies. The YAML file must have the following schema (optional keys are shown in brackets):

timeseries:
  - [name: Str]
    length: Int
    [semi-supervised: Bool]
    [supervised: Bool]
    base-oscillations:
      - kind: Enum[cylinder-bell-funnel,ecg,random-walk,sine,polynomial,random-mode-jump,formula]
        [trend: object]
        # parameters from [Introduction -> Base Oscillations](introduction#base-oscillations)
    anomalies:
      - length: Int
        [channel: Int]
        [position: Enum[beginning,middle,end]]
        [exact-position: Int]
        [creeping-length: Int]
        kinds:
          - kind: Enum[extremum,frequency,mean,pattern,pattern-shift,platform,variance,amplitude,trend,mode-correlation]
            # corresponding parameters from [Introduction -> Anomaly Types](introduction#anomaly-types)

As seen in the YAML schema, in one config file multiple time series with multiple anomalies can be defined. Each anomaly can furthermore be a combination of multiple anomaly types (e.g., platform and variance).

We also provide a JSON schema definition that also works with YAML files.

Commands

Once GutenTAG is installed and a desired config file is written, the user can call the following command to generate time series:

python -m gutenTAG [-h] [--version] --config-yaml CONFIG_YAML \
                  [--output-dir OUTPUT_DIR] \
                  [--plot] \
                  [--no-save] \
                  [--seed SEED] \
                  [--addons [ADDONS [ADDONS ...]]] \
                  [--n_jobs N_JOBS] \
                  [--only ONLY]

See python -m gutenTAG --help for the CLI usage and the explanation of the CLI arguments (further details also below).

Parameters

Name Type Description Default
config-yaml String Path to config.yaml -
output-dir String Path to output director generated-timeseries
plot Bool Whether a plot should be displayed False
no-save Bool Whether the saving should be skipped False
seed Int Random seed number for reproducibility None
addons String Python import paths (explained in Advanced Features) []
n_jobs Integer Number of parallelism to generate multiple time series in parallel 1
only String Name of a time series defined in the config.yaml that is considered while all others are excluded. None

Outputs

The generator will then create a directory with the desired timeseries and an overview.yaml file that has the same structure as the config.yaml with the extension of the generation-id parameter for time series without a name. This ID or the configured name tells the user which of the subfolders contains which time series. Inside of the subfolders is a test.csv file that represents the defined time series; if the time series is configured semi-supervised, there will be an additional file train_no_anomaly.csv containing a similar time series without anomalies; if the time series is configured supervised, there will be an additional file train_anomaly.csv containing a similar time series with anomalies. Each file has the following structure

timestamp,value-0,value-1,is_anomaly
0,0.1,0.3,0
1,0.3,0.3,0
2,0.1,0.3,1
3,0.2,0.3,0

The last column is the label, 0 if no anomaly else 1. The preceding columns represent the channels. The file has a header and an index column called timestamp.

From Python

To generate GutenTAG time series from Python, you have multiple options. Either you write a dict() with the same schema as in From CLI or you call the generation functions directly.

You can use the GutenTAG generator with a configuration dict in the following way:

from gutenTAG import GutenTAG, TrainingType, LABEL_COLUMN_NAME


config = {
    "timeseries": [
        {
            "name": "test",
            "length": 100,
            "base-oscillations": [
                {"kind": "sine"}
            ],
            "anomalies": [
                {"length": 5, "channel": 0, "kinds": [{"kind": "mean", "offset": .5}]}
            ]
        }
    ]
}
gutentag = GutenTAG(seed=1)
gutentag.load_config_dict(config)

# call generate() to create the datasets (in-memory)
datasets = gutentag.generate(return_timeseries=True)

# we only defined a single test time series
assert len(datasets) == 1
d = datasets[0]
assert d.name == "test"
assert d.training_type == TrainingType.TEST

# the data points are stored at
df = d.timeseries
df.iloc[:, 1:-1]
# the labels are stored at
df[LABEL_COLUMN_NAME]

Alternatively, you can generate time series data by calling the generation function of a base oscillation directly:

Attention

The Python API for injecting anomalies is still work in progress (WIP)!

import pandas as pd
import gutenTAG.api as gt
import matplotlib.pyplot as plt

N = 500
ts_sine = gt.sine(length=N, frequency=2, amplitude=1.8)
ts_dirichlet = gt.dirichlet(length=N)

# anomaly injection TBD

df = pd.DataFrame({"ch-0": ts_sine, "ch-1": ts_dirichlet})

df.plot()
plt.show()

The above script creates the following time series:

example-ts-api