NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining
Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu
This repository contains the official implementation of the paper: NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining, which is accepted to TMLR 2024. In this work, we propose the NuTime model for large-scale time series pretraining. The model is based on the Transformer architecture, which takes input as a set of tokens from non-overlapping windows. Each window is represented by its normalized shape, the window mean and standard deviation. We develop a numerically multi-scaled embedding method (NME) for representing the scalar values of mean and std. The model can take raw values of time-series data in any numerical scales as input without any data normalization and transformation.
Feel free to contact me ([email protected]) or open an issue if you have any questions or suggestions.
- 2024-12-23: Check the latest repository under the Microsoft account: microsoft/NuTime.
- 2024-11-12: Checkpoint of the self-supervised pretrained NuTime is released.
- 2024-11-12: Codes about data preprocessing, training, evaluation are released.
- 2024-07-15: It might take some time to clean the entire codebase for releasing, so we first provide the code about window & mean & std embeddings, which is the essential part of the proposed NuTime, at here.
- 2024-07-10: NuTime is accepted to TMLR 2024.
- Release the training and evaluation code
- Release the self-supervised pretrained NuTime
Please install PyTorch according to your CUDA version first. There are not restrictions on the torch version, feel free to use your preferred one.
git clone https://github.com/chenguolin/NuTime.git
cd NuTime
bash settings/setup.sh
Please refer to src/data/preprocess.py.
We provide the script to preprocess the data including: UCR
, UEA
, SleepEDF
, Epilepsy
, etc.
The processed and splitted Epilpesy
dataset is provided in datasets/Epilepsy for example.
-
The core part of our work is
WindowNormEncoder
in src/models/encoders/WindowNormEncoder.py andWinT
in src/models/networks.py. You can directly view the code for implementation details. Other codes are merely for data preprocessing, training, evaluation and ablation study, which could be ignored essentially. -
Checkpoint of the self-supervised (i.e., BYOL-style) pretrained NuTime (with
9
multi-scaled embeddings) is provided in ckpt/checkpoint_bias9.pth
python3 src/pipeline.py --config_file configs/demo_ft_epilepsy.json
If you find our work helpful, please consider citing:
@article{lin2024nutime,
title={NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining},
author={Chenguo Lin and Xumeng Wen and Wei Cao and Congrui Huang and Jiang Bian and Stephen Lin and Zhirong Wu},
journal={Transactions on Machine Learning Research (TMLR)},
year={2024}
}