Test: Temporal-Spatial Separated Transformer for Temporal Action Localization

This is an official implementation in PyTorch of TeST.

Summary

We propose TEST, which employs three transformer-based architecture variants, to conduct temporal action localization.
The three transformer-based architectures can effectively improve localization performance and space-time efficiency.
We propose to integrate the results from multiple feature maps to obtain more comprehensive predictions.
Extensive experiments on two real-world benchmarks validate the effectiveness and superiority of our proposed TEST.

Preparation

This repository is based on AFSD, using a similar code structure and environment.

Environment

NVIDIA GPU supporting CUDA 9.2
CUDA 9.2
Python 3.7
Pytorch == 1.4.0 (Please make sure that the pytorch version is 1.4.0 due to the module from AFSD)

Setup

conda install pytorch==1.4.0 cudatoolkit=9.2 -c pytorch
cd AFSD
python setup.py develop
pip install einops 
pip install pyyaml
pip install pandas

Data Preparation

You could follow AFSD to download the needed data.

THUMOS14 Data

Download the RGB and flow npy files
Create a 'data' fold
Put npy files into corresponding files

Backbone Parameters

Download the RGB and flow backbone parameters
Create a 'backbone' fold
Put 'rgb_imagenet.pt' and 'flow_imagenet.pt' in the 'backbone'

File Structure

Make sure the file structure is correct.

├── AFSD  # the AFSD module to obtain boundary features
│   ├── boundary_max_pooling_cuda.cpp
│   ├── boundary_max_pooling_kernel.cu
│   └── setup.py  # install the AFSD module
├── backbones  # the backbone parameters
│   ├── flow_imagenet.pt  # parameters for flow backbone
│   └── rgb_imagenet.pt  # parameters for rgb backbone
├── configs  # the training configs
│   ├── thumos_flow.yaml  # config for flow model
│   └── thumos_rgb.yaml  # config for rgb model
├── data  # THUMOS dataset
│   ├── test_flow_npy
│   ├── test_npy
│   ├── validation_flow_npy
│   └── validation_npy
├── dataset  # dataset files
│   ├── __init__.py
│   ├── dataset.py
│   └── video_transforms.py
├── imgs  # iamges for README
│   └── overview.png
├── model  # the main model files
│   ├── __init__.py
│   ├── boundary_pooling_op.py
│   ├── i3d_backbone.py
│   ├── layers.py
│   └── main.py
├── config.py  # process config
├── evaluate.py  # evaluate TeST
├── LICENSE
├── losses.py  # the loss function
├── README.md
├── test_ensemble.py  # test TeST using single feature map
├── test_single.py  # test TeST using multiple feature map
└── train.py  # train TeST

Training TeST

Train rgb model using:

python train.py --config_file configs/thumos_rgb.yaml

or train flow model using:

python train.py --config_file configs/thumos_flow.yaml

You could change the hyper-parameters in corresponding yaml files.

Testing TeST

You can test TeST using a single feature map by:

python test_single.py --config_file configs/thumos_rgb.yaml
python test_single.py --config_file configs/thumos_flow.yaml

or test using multiple feature maps:

python test_ensemble.py --config_file configs/thumos_rgb.yaml

Evaluating TeST

You can evaluate the output by running:

python evaluate.py --output_json output/detection_results_ensemble.json

Citation

If you find our work interesting/helpful, please consider citing TeST:

@article{wan2024test,
  title={TeST: Temporal-spatial separated transformer for temporal action localization},
  author={Wan, Herun and Luo, Minnan and Li, Zhihui and Wang, Yang},
  journal={Neurocomputing},
  pages={128688},
  year={2024},
  publisher={Elsevier}
}

Questions?

Feel free to open issues in this repository! Instead of emails, GitHub issues are much better at facilitating a conversation between you and our team to address your needs. You can also contact Herun Wan through [email protected].

Updating

20241030: We uploaded the complete codes for the THUMOS14 dataset and completed the readme.
20241006: We uploaded the initial codes without details. We plan to upload the complete codes and details by November.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Test: Temporal-Spatial Separated Transformer for Temporal Action Localization

Summary

Preparation

Environment

Setup

Data Preparation

File Structure

Training TeST

Testing TeST

Evaluating TeST

Citation

Questions?

Updating

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AFSD		AFSD
configs		configs
dataset		dataset
imgs		imgs
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
config.py		config.py
evaluate.py		evaluate.py
losses.py		losses.py
test_ensemble.py		test_ensemble.py
test_single.py		test_single.py
train.py		train.py

License

whr000001/TeST

Folders and files

Latest commit

History

Repository files navigation

Test: Temporal-Spatial Separated Transformer for Temporal Action Localization

Summary

Preparation

Environment

Setup

Data Preparation

File Structure

Training TeST

Testing TeST

Evaluating TeST

Citation

Questions?

Updating

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages