ATSC - Advance Time Series Compressor

NOTE: This is still under development. Current status is unsupported!

TL;DR

The fastest way to test ATSC is with a CSV file!

Download the latest release
Pick a CSV file from tests folder (Those will have the expected internal format).

Execute the following command:

cargo run --release -- --csv <input-file>

You have a compressed timeseries!

What is ATSC?

Advanced Time Series Compressor (in short: ATSC), is a configurable, lossy compressor that uses the characteristics of a time-series to create a function approximation of the time series.

This way, ATSC only needs to store the parametrization of the function and not the data.

ATSC draws inspiration from established compression and signal analysis techniques, achieving significant compression ratios.

In internal testing ATSC compressed from 46x to 880x the monitoring timeseries of our databases with a fitting error within 1% of the original time-series.

In some cases, ATSC would produce highly compressed data without any data loss (Perfect fitting functions). ATSC is meant to be used with long term storage of time series, as it benefits from more points to do a better fitting.

The decompression of data is faster (up to 40x) vs a slower compression speed, as it is expected that the data might be compressed once and decompressed several times.

Internally ATSC uses the following methods for time series fitting:

FFT (Fast Fourier Transforms)
Constant
Interpolation - Catmull-Rom
Interpolation - Inverse Distance Weight

For a more detailed insight into ATSC read the paper here: ATSC - A novel approach to time-series compression

ATSC input can be an internal format developed to process time series (WBRO), or a CSV. It outputs a compressed format (BRO). A CSV to WBRO format is available here: CSV Compressor

When to use ATSC?

ATSC fits in any place that needs space reduction in trade for precision. ATSC is to time series what JPG/MP3 is to image/audio. If there is no need of absolute precision of the output vs the original input, you could probably use ATSC.

Example of use cases:

In places where time series are rolled over, ATSC is a perfect fit. It would probably offer more space savings without any meaningful loss in precision.
Time series that are under sampled (e.g. once every 20sec). With ATSC you can greatly increase sample rate (e.g. once per second) without losing space.
Long, slow moving data series (e.g. Weather data). Those will most probably follow an easy to fit pattern
Data that is meant to be visualized by humans and not machine processed (e.g. Operation teams). With such a small error, under 1%, it shouldn't impact analysis.

Documentation

For full documentation please go to Docs

Building ATSC

Clone the repository:

git clone https://github.com/instaclustr/atsc
cd atsc

Build the project:
```
cargo build --release
```

ATSC Usage

Prerequisites

Ensure you have Rust installed on your system.

Usage

ATSC relies on files with a WBRO extension to operate, learn more about that here: WBRO - A time series format You can also compress from CSV with the provided CSV tool Those files would work as input for the compressor.

Compressor usage:

Usage: atsc [OPTIONS] <INPUT>

Arguments:
  <INPUT>  input file

      --compressor <COMPRESSOR>
          Select a compressor, default is auto [default: auto] [possible values: auto, noop, fft, constant, polynomial, idw]
  -e, --error <ERROR>
          Sets the maximum allowed error for the compressed data, must be between 0 and 50. Default is 5 (5%).
          0 is lossless compression
          50 will do a median filter on the data.
          In between will pick optimize for the error [default: 5]
  -u
          Uncompresses the input file/directory
  -c, --compression-selection-sample-level <COMPRESSION_SELECTION_SAMPLE_LEVEL>
          Samples the input data instead of using all the data for selecting the optimal compressor.
          Only impacts speed, might or not increased compression ratio. For best results use 0 (default).
          Only works when compression = Auto.
          0 will use all the data (slowest)
          6 will sample 128 data points (fastest) [default: 0]
      --verbose
          Verbose output, dumps everysample in the input file (for compression) and in the ouput file (for decompression)
      --csv
          Defines user input as a CSV file
      --no-header
          Defines if the CSV has no header
      --fields <FIELDS>
          Defines names of fields in CSV file. It should follow this format:
            --fields=TIME_FIELD_NAME,VALUE_FIELD_NAME
          It assumes that the one before comma is a name of time field and the one
          after comma is value field. [default: time,value]
  -h, --help
          Print help
  -V, --version
          Print version

Compress a File

To compress a file using ATSC, run:

atsc <input-file>

Decompress a File

To decompress a file, use:

atsc -u <input-file>

Releases

v0.7 - 20/11/2024

Added CSV Support
Greatly improved documentation
Improved Benchmark and testing
Improved FFT compression
Improved Polynomial compression
Demo files and generation scripts
Several fixes and cleanups

v0.6 - 09/11/2024

Internal release

v0.5 - 30/11/2023

Added Polynomial Compressor (with 2 variants)
Created and Integrated a proper file type (wbro)
Benchmarks of the different compressors
Integration testing
Several fixes and cleanups

Roadmap

Frame expansion (Allowing new data to be appended to existing frames)
Dynamic function loading (e.g. providing more functions without touching the whole code base)
Global/Per frame error storage
Efficient error

Support Status

Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project

Name		Name	Last commit message	Last commit date
Latest commit History 485 Commits
.github		.github
atsc		atsc
csv-compressor		csv-compressor
docs		docs
paper		paper
tools		tools
vsri		vsri
wavbrro		wavbrro
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
NOTICE.TXT		NOTICE.TXT
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATSC - Advance Time Series Compressor

Table of Contents

TL;DR

What is ATSC?

When to use ATSC?

Documentation

Building ATSC

ATSC Usage

Prerequisites

Usage

Compress a File

Decompress a File

Releases

v0.7 - 20/11/2024

v0.6 - 09/11/2024

v0.5 - 30/11/2023

Roadmap

Support Status

About

Releases 2

Contributors 8

Languages

License

instaclustr/atsc

Folders and files

Latest commit

History

Repository files navigation

ATSC - Advance Time Series Compressor

Table of Contents

TL;DR

What is ATSC?

When to use ATSC?

Documentation

Building ATSC

ATSC Usage

Prerequisites

Usage

Compress a File

Decompress a File

Releases

v0.7 - 20/11/2024

v0.6 - 09/11/2024

v0.5 - 30/11/2023

Roadmap

Support Status

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Contributors 8

Languages