Model Zoo

Current results of self-supervised learning benchmarks are based on MMSelfSup and solo-learn. We will rerun the experiments and update more reliable results soon!

Supported sample mixing policies

Relative Location [ICCV'2015]
Rotation Prediction [ICLR'2018]
DeepCluster [ECCV'2018]
NPID [CVPR'2018]
ODC [CVPR'2020]
MoCov1 [CVPR'2020]
SimCLR [ICML'2020]
MoCov2 [ArXiv'2020]
BYOL [NIPS'2020]
SwAV [NIPS'2020]
DenseCL [CVPR'2021]
SimSiam [CVPR'2021]
Barlow Twins [ICML'2021]
MoCo v3 [ICCV'2021]
MAE [CVPR'2022]
SimMIM [CVPR'2022]
CAE [ArXiv'2022]
A2MIM [ArXiv'2022]

ImageNet-1K pre-trained models

The training details are provided in the config files. You can click the method's name to obtain more information.

Method	Config	Download
Relative Location	r50_8xb64_step_ep70	model
Rotation Prediction	r50_8xb64_step_ep70	model
DeepCluster	r50_sobel_8xb64_step_ep200	model
NPID	r50_4xb64_step_ep200	model
ODC	r50_8xb64_step_ep440	model
SimCLR	r50_8xb64_cos_lr0_6_fp16_ep200	model
	r50_16xb256_cos_lr4_8_fp16_ep200	model
MoCoV2	r50_4xb64_cos	model
BYOL	r50_8xb64_accu8_cos_lr4_8_fp16_ep200	model
	r50_8xb64_accu8_cos_lr4_8_fp16_ep300	model
SwAV	r50_8xb64_accu8_cos_lr9_6-mcrop-224_2-96_6_fp16_ep200	model
DenseCL	r50_4xb64_cos	model
SimSiam	r50_4xb64_cos_lr0_05_ep100	model
	r50_4xb64_cos_lr0_05_ep200	model
BarlowTwins	r50_8xb64_accu4_cos_lr1_6_ep300	model
MoCoV3	vit_small_8xb64_accu8_cos_fp16_ep300	model
MAE	vit_base_dec8_dim512_8xb128_accu4_cos_fp16_ep400	model
SimMIM	swin_base_sz192_8xb128_accu2_cos_ep100	model
	vit_base_rgb_m_sz224_8xb128_accu2_step_fp16_ep800	model
MaskFeat	vit_base_hog_108_sz224_8xb128_accu2_cos_fp16_ep300	model
CAE	vit_base_sz224_8xb64_accu4_cos_fp16_ep300	model
A2MIM	r50_l3_sz224_init_8xb256_cos_ep100	model
	r50_l3_sz224_init_8xb256_cos_ep300	model
	vit_base_l0_sz224_8xb128_accu2_step_ep800	model

ImageNet-1K Linear Evaluation

Note

If not specifically indicated, the testing GPUs are NVIDIA Tesla V100 on MMSelfSup and OpenMixup. The pre-training and fine-tuning testing image size are $224\times 224$.
The table records the implementors who implemented the methods (either by themselves or refactoring from other repos), and the experimenters who performed experiments and reproduced the results. The experimenters should be responsible for the evaluation results on all the benchmarks, and the implementors should be responsible for the implementation as well as the results; If the experimenter is not indicated, an implementator is the experimenter by default.
We use config r50_multihead for ImageNet multi-heads and r50_linear for the global average pooled feature evaluation.

Methods	Remarks	Batch size	Epochs	Procotol	Linear
PyTorch	torchvision	256	90	MoCo	76.17
Random	kaiming	-	-	MoCo	4.35
Relative-Loc	ResNet-50	512	70	MoCo	38.83
Rotation	ResNet-50	128	70	MoCo	47.01
DeepCluster	ResNet-50	512	200	MoCo	46.92
NPID	ResNet-50	256	200	MoCo	56.60
ODC	ResNet-50	512	440	MoCo	53.42
SimCLR	ResNet-50	256	200	SimSiam	62.56
	ResNet-50	4096	200	SimSiam	66.66
MoCov1	ResNet-50	256	200	MoCo	61.02
MoCoV2	ResNet-50	256	200	MoCo	67.69
BYOL	ResNet-50	4096	200	SimSiam	71.88
	ResNet-50	4096	300	SimSiam	72.93
SwAV	ResNet-50	512	200	SimSiam	70.47
DenseCL	ResNet-50	256	200	MoCo	63.62
SimSiam	ResNet-50	512	100	SimSiam	68.28
	ResNet-50	512	200	SimSiam	69.84
BarlowTwins	ResNet-50	2048	300	BarlowTwins	71.66
MoCoV3	ViT-Small	4096	400	MoCoV3	73.19

ImageNet-1K End-to-end Fine-tuning Evaluation

Note

All compared methods adopt ResNet-50 or ViT-B architectures and are pre-trained on ImageNet-1K. The pre-training and fine-tuning testing image size are $224\times 224$, except for SimMIM with Swin-Base using $192\times 192$. The fine-tuning protocols include: RSB A3 and RSB A2 for ResNet-50, BEiT for ViT-B.
You can find pre-training codes of compared methods in OpenMixup, VISSL, solo-learn, and the official repositories. You can download fine-tuned models from a2mim-in1k-weights or Baidu Cloud (3q5i).

Methods	Backbone	Source	Batch size	PT epoch	FT protocol	FT top-1
PyTorch	ResNet-50	PyTorch	256	90	RSB A3	78.8
Inpainting	ResNet-50	OpenMixup	512	70	RSB A3	78.4
Relative-Loc	ResNet-50	OpenMixup	512	70	RSB A3	77.8
Rotation	ResNet-50	OpenMixup	128	70	RSB A3	77.7
SimCLR	ResNet-50	VISSL	4096	100	RSB A3	78.5
MoCoV2	ResNet-50	OpenMixup	256	100	RSB A3	78.5
BYOL	ResNet-50	OpenMixup	4096	100	RSB A3	78.7
	ResNet-50	Official	4096	300	RSB A3	78.9
	ResNet-50	Official	4096	300	RSB A2	80.1
SwAV	ResNet-50	VISSL	4096	100	RSB A3	78.9
	ResNet-50	Official	4096	400	RSB A3	79.0
	ResNet-50	Official	4096	400	RSB A2	80.2
BarlowTwins	ResNet-50	solo learn	2048	100	RSB A3	78.5
	ResNet-50	Official	2048	300	RSB A3	78.8
MoCoV3	ResNet-50	Official	4096	100	RSB A3	78.7
	ResNet-50	Official	4096	300	RSB A3	79.0
	ResNet-50	Official	4096	300	RSB A2	80.1
A2MIM	ResNet-50	OpenMixup	2048	100	RSB A3	78.8
	ResNet-50	OpenMixup	2048	300	RSB A3	78.9
	ResNet-50	OpenMixup	2048	300	RSB A2	80.4
MAE	ViT-Base	OpenMixup	4096	400	BEiT (MAE)	83.1
SimMIM	Swin-Base	OpenMixup	2048	100	BEiT (SimMIM)	82.9
	ViT-Base	OpenMixup	2048	800	BEiT (SimMIM)	83.9
CAE	ViT-Base	OpenMixup	2048	300	BEiT (CAE)	83.2
MaskFeat	ViT-Base	OpenMixup	2048	300	BEiT (MaskFeat)	83.5
A2MIM	ViT-Base	OpenMixup	2048	800	BEiT (SimMIM)	84.3

Downstream Task Benchmarks

Places205 Linear Classification

Note

In this benchmark, we use the config files of r50_mhead and r50_mhead_sobel. For DeepCluster, use the corresponding one with _sobel.
Places205 evaluates features in around 9k dimensions from different layers. Top-1 result of the last epoch is reported.

ImageNet Semi-Supervised Classification

Note

In this benchmark, the necks or heads are removed and only the backbone CNN is evaluated by appending a linear classification head. All parameters are fine-tuned. We use config files under imagenet_per_1 for 1% data and imagenet_per_10 for 10% data.
When training with 1% ImageNet, we find hyper-parameters especially the learning rate greatly influence the performance. Hence, we prepare a list of settings with the base learning rate from {0.001, 0.01, 0.1} and the learning rate multiplier for the head from {1, 10, 100}. We choose the best performing setting for each method. Please use --deterministic in this benchmark.

PASCAL VOC07+12 Object Detection

Note

This benchmark follows the evluation protocols set up by MoCo. model_zoo in MMSelfSup for results.
Config: benchmarks/detection/configs/pascal_voc_R_50_C4_24k_moco.yaml.
Please follow here to run the evaluation.

COCO2017 Object Detection

Note

This benchmark follows the evluation protocols set up by MoCo. Refer to model_zoo in MMSelfSup for results.
Config: benchmarks/detection/configs/coco_R_50_C4_2x_moco.yaml.
Please follow here to run the evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model_Zoo_selfsup.md

Model_Zoo_selfsup.md

Model Zoo

ImageNet-1K pre-trained models

ImageNet-1K Linear Evaluation

ImageNet-1K End-to-end Fine-tuning Evaluation

Downstream Task Benchmarks

Places205 Linear Classification

ImageNet Semi-Supervised Classification

PASCAL VOC07+12 Object Detection

COCO2017 Object Detection

Files

Model_Zoo_selfsup.md

Latest commit

History

Model_Zoo_selfsup.md

File metadata and controls

Model Zoo

ImageNet-1K pre-trained models

ImageNet-1K Linear Evaluation

ImageNet-1K End-to-end Fine-tuning Evaluation

Downstream Task Benchmarks

Places205 Linear Classification

ImageNet Semi-Supervised Classification

PASCAL VOC07+12 Object Detection

COCO2017 Object Detection