We provide five evaluation scripts that can be used on common benchmarks. If you are looking to use DEVA on your own data, I suggest you go to DEMO.md instead.
The scripts are:
- Video Object Segmentation (VOS) evaluation
- Open-World/Large-Vocabulary/Unsupervised Video Object Segmentation on VIPSeg/BURST/DAVIS 2017
- Unsupervised Video Object Segmentation (or rather, saliency) on DAVIS 2016
- Referring Video Object Segmentation (Ref-VOS) evaluation for the Ref-DAVIS dataset
- Referring Video Object Segmentation (Ref-VOS) evaluation for the Ref-YouTubeVOS dataset
Only (1) is standalone. (2)-(5) require detections from an image model.
We provide:
- Pretrained DEVA model (which you can obtain from
). - Pre-computed detections from image models. [All can be found here].
- Pre-computed outputs from DEVA. [All can be found here].
- Links to the repositories of the image models.
Here are some of the useful argument options that are shared for all the evaluation scripts.
- Specify
to use mixed precision for faster processing with a lower memory footprint. - Specify
--size [xxx]
to change the internal processing resolution. The default is 480. - Specify
--chunk_size [xxx]
to change the number of objects processed at once. The default is -1, which means all objects are processed in a single pass as a batch. - Specify
--model [xxx]
to change the path to the pretrained DEVA model.
python evaluation/eval_vos.py --dataset [dataset] --output [output directory]
- Possible options for [dataset]:
(DAVIS 2016),D17
(DAVIS 2017),Y18
(YouTubeVOS-2019), andG
(Generic dataset, see below). - Specify
--split test
to test on the DAVIS 2017 test-dev set. - For generic dataset, additionally specify
. It should point to a directory that containsJPEGImages
. In each of those folders, there should be directories of the same name as the video names. Each of those directories should contain the images or annotations for the video. - By default, we only use the first-frame annotation in the generic mode. Specify
to incorporate new objects (as in the YouTubeVOS dataset).
To get quantitative results:
- DAVIS 2017 validation: davis2017-evaluation or vos-benchmark.
- DAVIS 2016 validation: vos-benchmark.
- DAVIS 2017 test-dev: CodaLab
- YouTubeVOS 2018 validation: CodaLab
- YouTubeVOS 2019 validation: CodaLab
Known issue: We note that DEVA video object segmentation does not perform as well as XMem for very long videos after further testing. This is characterized by a much higher false positive rate when the target object is out-of-view. This might be a consequence of "stable data augmentation" which means the target object is in-view most of the time during training.
Download VIPSeg from https://github.com/VIPSeg-Dataset/VIPSeg-Dataset and convert the data into 720p using their scripts.
Note: some VIPSeg detections are missing due to a pre-processing error. See #88.
python evaluation/eval_with_detections.py \
--mask_path [path to detections] --img_path [path to 720p VIPSeg images] \
--dataset vipseg --temporal_setting [online/semionline] \
--output [output directory] --chunk_size 4
Quantitative results should be computed automatically.
Detection models:
- PanoFCN: https://github.com/dvlab-research/PanopticFCN
- Video-K-Net: https://github.com/lxtGH/Video-K-Net
- Mask2Former: https://github.com/facebookresearch/Mask2Former
Download BURST from https://github.com/Ali2500/BURST-benchmark and subsample every three frames as mentioned in the paper.
python evaluation/eval_with_detections.py \
--mask_path [path to detections] --img_path [path to BURST images] \
--dataset burst --save_all --temporal_setting [online/semionline] \
--output [output directory] --chunk_size 4
Quantitative results can be obtained using https://github.com/Ali2500/BURST-benchmark.
Detection models:
- Mask2Former: https://github.com/facebookresearch/Mask2Former
- EntitySeg: https://github.com/qqlu/Entity
DAVIS 2017:
Download DAVIS 2017 from https://davischallenge.org/.
python evaluation/eval_with_detections.py \
--mask_path [path to detections] --img_path [path to 480p DAVIS images] \
--dataset unsup_davis17 --temporal_setting [online/semionline] \
--output [output directory] --chunk_size 4
Quantitative results can be obtained using https://github.com/davisvideochallenge/davis2017-evaluation.
Detection models:
- EntitySeg: https://github.com/qqlu/Entity
We provide a demo script that runs DEVA on a single video.
python evaluation/eval_with_detections.py \
--mask_path ./example/vipseg/source --img_path ./example/vipseg/images \
--dataset demo --temporal_setting semionline \
--output ./example/output --chunk_size 1
Download DAVIS 2016 from https://davischallenge.org/.
python evaluation/eval_saliency.py \
--mask_path [path to detections] --img_path [path to 480p DAVIS images] \
--output [output directory] --imset_path [path to a imset file]
The imset file should contain the names of the videos to be evaluated. If you followed our directory structure (in TRAINING.md), it should be at ../DAVIS/2017/trainval/ImageSets/2016/val.txt
Quantitative results can be obtained using https://github.com/davisvideochallenge/davis2017-evaluation.
Detection models: DIS: https://github.com/xuebinqin/DIS
Referring-DAVIS 2017:
Download Referring-DAVIS from https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/video-segmentation/video-object-segmentation-with-language-referring-expressions.
python evaluation/eval_ref_davis.py \
--mask_path [path to detections] --img_path [path to 480p DAVIS images] \
--output [output directory]
Note that there are four different expressions for each video. We evaluate each expression separately and report the average.
Quantitative results can be obtained using https://github.com/davisvideochallenge/davis2017-evaluation.
Detection models: ReferFormer: https://github.com/wjn922/ReferFormer
Download Referring-YouTubeVOS from https://youtube-vos.org/dataset/rvos/.
python evaluation/eval_ref_youtubevos.py \
--mask_path [path to detections] --img_path [path to YouTubeVOS images] \
--output [output directory]
Quantitative results can be obtained from https://competitions.codalab.org/competitions/29139.
Detection models: ReferFormer: https://github.com/wjn922/ReferFormer
See https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md.