Skip to content

Latest commit

 

History

History
125 lines (80 loc) · 6.91 KB

README_EN.md

File metadata and controls

125 lines (80 loc) · 6.91 KB

English | 简体中文

What is AutoX_video

autox_video provides an automatic machine learning framework for video understanding based on the mmaction2 codebase, and you can conveniently train video understanding tasks with simple commands.

framework

Table of contents

Install

quick start

Pretrained weights

Show results

Follow-up

Installation

Dependencies

  1. Python 3.6+
  2. PyTorch 1.3+
  3. CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
  4. GCC 5+
  5. mmcv 1.1.1+
  6. Numpy
  7. ffmpeg (4.2 is preferred)
  8. decorated (optional, 0.4.1+): Install CPU version by pip install decorated==0.4.1 and install GPU version from source

Pytorch

Install PyTorch and torchvision according to the official documentation , such as:

conda install pytorch torchvision -c pytorch

Make sure the build version of CUDA matches the run version of CUDA. Users can refer to the PyTorch official website to check the CUDA version supported by the precompiled package.

MMCV

To install mmcv-full, we recommend that you install the following prebuilt packages:

# pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html

Alternatively, users can compile from source by using the following command:

git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full, which contains cuda ops, will be installed after this step
# OR pip install -e .  # package mmcv, which contains no cuda ops, will be installed after this step
cd ..

Or run the script directly:

pip install mmcv-full

other

pip install -r build.txt
python mmaction2/setup.py develop

quick start

We provide a demo dataset as a test, you can run your own dataset in the same way.

train

python AutoTrain.py

This starts the training of the model, we automatically save the training results of the model, and at the end of every two epochs, evaluate the model on the validation set and store the optimal weights.

In case of an unexpected interruption, re-executing this command, we will restore the previous training results instead of starting over (unless you have changed the location of the working directory).

You can train and test the model on your own dataset by modifying the dataset settings in config.yaml .

test

python AutoTest.py

This will automatically read the optimal weights stored in the working directory, use it to test the model on the test set, and output the inference results to the location specified in config.yaml (results.json by default).

Pretrained weights

The pre-trained weights used by the model can be downloaded from the link below. After downloading, store the weight files in the checkpoints directory, and the pre-trained weights will be used automatically to start training during training (the pre-trained weight files are provided by Video-Swin-Transformer )

Backbone Pretrain Lr Schd spatial crops acc@1 acc@5 #params FLOPs model
Swin-B ImageNet22k & Kinetics600 30ep 224 84.0 96.5 88M 281.6G github / baidu

Show results

We took first place in the video classification track of the ACM MM 22 PRE-TRAINING FOR VIDEO UNDERSTANDING CHALLENGE competitionleaderboard

Test on public datasets:

Dataset Top 1 Accuracy
HMDB51 0.5902
UCF101 0.9407

Follow-up

  1. At present, the code only supports Video Swin Transformer, a backbone, which is the best and more general model in our experiments. More video understanding models will be added in the future for users to choose freely.
  2. Currently, only video classification is supported. In fact, this framework is general for tasks such as video object detection and video semantic segmentation, and interfaces for other video tasks will be developed in the future.