Skip to content

Latest commit

 

History

History
323 lines (293 loc) · 24.4 KB

EVAL_README.md

File metadata and controls

323 lines (293 loc) · 24.4 KB

Eval notes

Base command python ./eval_linear.py --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --evaluate

ViT-S/8

python ./eval_linear.py --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --evaluate --patch_size=8
Model vit_small built.
We load the reference pretrained linear weights.
Downloading: "https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_linearweights.pth" to /home/vihang/.cache/torch/hub/checkpoints/dino_deitsmall16_linearweights.pth
100.0%
Test:  [  0/391]  eta: 0:17:39  loss: 0.380281 (0.380281)  acc1: 93.750000 (93.750000)  acc5: 98.437500 (98.437500)  time: 2.708638  data: 1.828419  max mem: 846
Test:  [ 20/391]  eta: 0:01:20  loss: 0.559775 (0.560542)  acc1: 84.375000 (85.156250)  acc5: 96.875000 (96.837798)  time: 0.091588  data: 0.000287  max mem: 993
Test:  [ 40/391]  eta: 0:00:53  loss: 0.504800 (0.601536)  acc1: 84.375000 (84.032012)  acc5: 97.656250 (96.455793)  time: 0.083965  data: 0.000281  max mem: 993
Test:  [ 60/391]  eta: 0:00:42  loss: 0.502291 (0.564686)  acc1: 87.500000 (85.245902)  acc5: 97.656250 (96.695697)  time: 0.081432  data: 0.000253  max mem: 993
Test:  [ 80/391]  eta: 0:00:36  loss: 0.716538 (0.618657)  acc1: 79.687500 (83.516590)  acc5: 96.093750 (96.508488)  time: 0.083536  data: 0.000242  max mem: 993
Test:  [100/391]  eta: 0:00:32  loss: 0.623209 (0.628215)  acc1: 85.156250 (83.408106)  acc5: 96.093750 (96.441832)  time: 0.083887  data: 0.000238  max mem: 993
Test:  [120/391]  eta: 0:00:28  loss: 0.531973 (0.632460)  acc1: 83.593750 (83.316116)  acc5: 96.875000 (96.526343)  time: 0.083836  data: 0.000224  max mem: 993
Test:  [140/391]  eta: 0:00:25  loss: 0.435032 (0.619030)  acc1: 89.843750 (83.626995)  acc5: 97.656250 (96.636746)  time: 0.083697  data: 0.000240  max mem: 993
Test:  [160/391]  eta: 0:00:23  loss: 0.679810 (0.630977)  acc1: 81.250000 (83.448175)  acc5: 95.312500 (96.462539)  time: 0.084011  data: 0.000206  max mem: 993
Test:  [180/391]  eta: 0:00:20  loss: 1.136989 (0.689275)  acc1: 69.531250 (81.979454)  acc5: 90.625000 (95.795925)  time: 0.083333  data: 0.000245  max mem: 993
Test:  [200/391]  eta: 0:00:18  loss: 1.277641 (0.747734)  acc1: 64.843750 (80.612562)  acc5: 90.625000 (95.114272)  time: 0.082513  data: 0.000226  max mem: 993
Test:  [220/391]  eta: 0:00:16  loss: 1.096730 (0.774484)  acc1: 73.437500 (80.062217)  acc5: 91.406250 (94.768100)  time: 0.081201  data: 0.000229  max mem: 993
Test:  [240/391]  eta: 0:00:14  loss: 0.753314 (0.788769)  acc1: 78.906250 (79.839860)  acc5: 92.968750 (94.508558)  time: 0.083774  data: 0.000209  max mem: 993
Test:  [260/391]  eta: 0:00:12  loss: 1.213251 (0.828748)  acc1: 64.843750 (78.747605)  acc5: 90.625000 (94.064296)  time: 0.083779  data: 0.000236  max mem: 993
Test:  [280/391]  eta: 0:00:10  loss: 1.027571 (0.848783)  acc1: 74.218750 (78.386343)  acc5: 92.187500 (93.808385)  time: 0.083863  data: 0.000228  max mem: 993
Test:  [300/391]  eta: 0:00:08  loss: 1.108118 (0.867847)  acc1: 72.656250 (77.974460)  acc5: 89.843750 (93.513808)  time: 0.083811  data: 0.000254  max mem: 993
Test:  [320/391]  eta: 0:00:06  loss: 1.179348 (0.883526)  acc1: 71.875000 (77.587130)  acc5: 88.281250 (93.251071)  time: 0.083823  data: 0.000246  max mem: 993
Test:  [340/391]  eta: 0:00:04  loss: 1.224718 (0.904226)  acc1: 67.187500 (77.057368)  acc5: 89.062500 (93.032900)  time: 0.084049  data: 0.000241  max mem: 993
Test:  [360/391]  eta: 0:00:02  loss: 0.979421 (0.914856)  acc1: 75.000000 (76.820031)  acc5: 91.406250 (92.949273)  time: 0.083515  data: 0.000244  max mem: 993
Test:  [380/391]  eta: 0:00:00  loss: 0.814474 (0.913746)  acc1: 76.562500 (76.851624)  acc5: 93.750000 (92.950295)  time: 0.083795  data: 0.000182  max mem: 993
Test:  [390/391]  eta: 0:00:00  loss: 0.723986 (0.911126)  acc1: 76.562500 (76.908000)  acc5: 93.750000 (92.980000)  time: 0.082328  data: 0.000111  max mem: 993
Test: Total time: 0:00:35 (0.090729 s / it)
* Acc@1 76.908 Acc@5 92.980 loss 0.911
Accuracy of the network on the 50000 test images: 76.9%
python eval_knn.py --arch=vit_small --patch_size=8 --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC --dump_features eval_knn_vits_8_features
Will run the code on one GPU.
| distributed init (rank 0): env://
git:
  sha: 66febe141b5fc6ba5cdcf33407b1049ec3c6c462, status: has uncommited changes, branch: gimlet

arch: vit_small
batch_size_per_gpu: 128
checkpoint_key: teacher
data_path: /datasets/imagenet/ILSVRC/Data/CLS-LOC
dist_url: env://
dump_features: eval_knn_vits_8_features
gpu: 0
half: False
load_features: None
local_rank: 0
nb_knn: [10, 20, 100, 200]
num_workers: 10
patch_size: 8
pretrained_weights:
rank: 0
temperature: 0.07
use_cuda: True
world_size: 1
Data loaded with 1281167 train and 50000 val imgs.
Model vit_small 8x8 built.
Please use the `--pretrained_weights` argument to indicate the path of the checkpoint to evaluate.
Since no pretrained weights have been provided, we load the reference pretrained DINO weights.
Extracting features for train set...
Storing features into tensor of shape torch.Size([1281167, 384])
  [    0/10010]  eta: 8:41:41    time: 3.126985  data: 1.496408  max mem: 4515
  <REMOVED>
  [10009/10010]  eta: 0:00:00    time: 0.636899  data: 0.000107  max mem: 6392
 Total time: 1:28:46 (0.532138 s / it)
Extracting features for val set...
Storing features into tensor of shape torch.Size([50000, 384])
  [  0/391]  eta: 0:10:42    time: 1.642119  data: 1.128212  max mem: 6392
  <REMOVED>
  [390/391]  eta: 0:00:00    time: 0.636005  data: 0.000107  max mem: 6465
 Total time: 0:03:28 (0.534096 s / it)
Features are ready!
Start the k-NN classification.
10-NN classifier result: Top1: 78.33, Top5: 90.896
20-NN classifier result: Top1: 78.33, Top5: 92.4
100-NN classifier result: Top1: 76.916, Top5: 93.662
200-NN classifier result: Top1: 76.096, Top5: 93.792

ViT-S/16

python ./eval_linear.py --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --evaluate --patch_size=16
Model vit_small built.
We load the reference pretrained linear weights.
Test:  [  0/391]  eta: 0:22:05  loss: 0.312423 (0.312423)  acc1: 95.312500 (95.312500)  acc5: 98.437500 (98.437500)  time: 3.391267  data: 1.672790  max mem: 4970
Test:  [ 20/391]  eta: 0:03:44  loss: 0.452702 (0.475851)  acc1: 86.718750 (86.904762)  acc5: 97.656250 (97.693452)  time: 0.464485  data: 0.000323  max mem: 5559
Test:  [ 40/391]  eta: 0:03:07  loss: 0.463861 (0.515941)  acc1: 85.156250 (85.575457)  acc5: 98.437500 (97.351372)  time: 0.463942  data: 0.000553  max mem: 5559
Test:  [ 60/391]  eta: 0:02:49  loss: 0.441266 (0.489972)  acc1: 87.500000 (86.680328)  acc5: 96.875000 (97.297643)  time: 0.467964  data: 0.000543  max mem: 5559
Test:  [ 80/391]  eta: 0:02:33  loss: 0.650984 (0.539812)  acc1: 82.812500 (85.310571)  acc5: 96.875000 (97.077546)  time: 0.434674  data: 0.000379  max mem: 5559
Test:  [100/391]  eta: 0:02:22  loss: 0.532990 (0.551392)  acc1: 86.718750 (85.202661)  acc5: 96.875000 (96.975557)  time: 0.471931  data: 0.000890  max mem: 5559
Test:  [120/391]  eta: 0:02:10  loss: 0.470687 (0.562025)  acc1: 84.375000 (85.156250)  acc5: 97.656250 (97.068698)  time: 0.433611  data: 0.000233  max mem: 5559
Test:  [140/391]  eta: 0:01:58  loss: 0.380097 (0.549805)  acc1: 88.281250 (85.328014)  acc5: 98.437500 (97.212988)  time: 0.431170  data: 0.000219  max mem: 5559
Test:  [160/391]  eta: 0:01:48  loss: 0.662740 (0.561041)  acc1: 83.593750 (85.161102)  acc5: 96.875000 (97.069099)  time: 0.430035  data: 0.000231  max mem: 5559
Test:  [180/391]  eta: 0:01:37  loss: 0.863500 (0.605005)  acc1: 75.781250 (84.016747)  acc5: 93.750000 (96.620338)  time: 0.430917  data: 0.000213  max mem: 5559
Test:  [200/391]  eta: 0:01:28  loss: 1.101570 (0.655729)  acc1: 67.968750 (82.812500)  acc5: 92.187500 (96.120958)  time: 0.433986  data: 0.000225  max mem: 5559
Test:  [220/391]  eta: 0:01:18  loss: 0.844362 (0.674672)  acc1: 78.906250 (82.356476)  acc5: 94.531250 (95.927602)  time: 0.430764  data: 0.000229  max mem: 5559
Test:  [240/391]  eta: 0:01:08  loss: 0.683789 (0.683675)  acc1: 82.031250 (82.277619)  acc5: 94.531250 (95.753371)  time: 0.431250  data: 0.000218  max mem: 5559
Test:  [260/391]  eta: 0:00:59  loss: 0.949632 (0.718124)  acc1: 73.437500 (81.324832)  acc5: 92.187500 (95.402299)  time: 0.433487  data: 0.000230  max mem: 5559
Test:  [280/391]  eta: 0:00:50  loss: 0.819773 (0.732555)  acc1: 77.343750 (81.010899)  acc5: 93.750000 (95.231873)  time: 0.430006  data: 0.000199  max mem: 5559
Test:  [300/391]  eta: 0:00:41  loss: 0.940584 (0.748425)  acc1: 75.000000 (80.658223)  acc5: 92.968750 (95.021802)  time: 0.430271  data: 0.000224  max mem: 5559
Test:  [320/391]  eta: 0:00:31  loss: 0.984055 (0.761864)  acc1: 76.562500 (80.332457)  acc5: 90.625000 (94.818438)  time: 0.429488  data: 0.000225  max mem: 5559
Test:  [340/391]  eta: 0:00:22  loss: 1.017162 (0.777974)  acc1: 72.656250 (79.928061)  acc5: 92.968750 (94.680169)  time: 0.430123  data: 0.000207  max mem: 5559
Test:  [360/391]  eta: 0:00:13  loss: 0.895487 (0.788876)  acc1: 76.562500 (79.655038)  acc5: 93.750000 (94.587517)  time: 0.434021  data: 0.000202  max mem: 5559
Test:  [380/391]  eta: 0:00:04  loss: 0.642365 (0.789293)  acc1: 77.343750 (79.615732)  acc5: 95.312500 (94.582513)  time: 0.429795  data: 0.000159  max mem: 5559
Test:  [390/391]  eta: 0:00:00  loss: 0.642365 (0.787828)  acc1: 78.906250 (79.612000)  acc5: 95.312500 (94.622000)  time: 0.426404  data: 0.000113  max mem: 5559
Test: Total time: 0:02:54 (0.446400 s / it)
* Acc@1 79.612 Acc@5 94.622 loss 0.788
Accuracy of the network on the 50000 test images: 79.6%
python eval_knn.py --arch=vit_small --patch_size=16 --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC --dump_features eval_knn_vits_16_features
Will run the code on one GPU.
| distributed init (rank 0): env://
git:
  sha: 314dd801c98fc4098afc70abbeafcb506b2c907e, status: has uncommited changes, branch: gimlet

arch: vit_small
batch_size_per_gpu: 128
checkpoint_key: teacher
data_path: /datasets/imagenet/ILSVRC/Data/CLS-LOC
dist_url: env://
dump_features: eval_knn_vits_16_features
gpu: 0
load_features: None
local_rank: 0
nb_knn: [10, 20, 100, 200]
num_workers: 10
patch_size: 16
pretrained_weights:
rank: 0
temperature: 0.07
use_cuda: True
world_size: 1
Data loaded with 1281167 train and 50000 val imgs.
Model vit_small 16x16 built.
Please use the `--pretrained_weights` argument to indicate the path of the checkpoint to evaluate.
Since no pretrained weights have been provided, we load the reference pretrained DINO weights.
Extracting features for train set...
Storing features into tensor of shape torch.Size([1281167, 384])
  <REMOVED>
  [10009/10010]  eta: 0:00:00    time: 0.106392  data: 0.000068  max mem: 2600
 Total time: 0:14:44 (0.088400 s / it)
Extracting features for val set...
Storing features into tensor of shape torch.Size([50000, 384])
  <REMOVED>
  [390/391]  eta: 0:00:00    time: 0.106757  data: 0.000093  max mem: 2673
 Total time: 0:00:35 (0.091346 s / it)
Features are ready!
Start the k-NN classification.
10-NN classifier result: Top1: 74.402, Top5: 88.422
20-NN classifier result: Top1: 74.444, Top5: 90.02
100-NN classifier result: Top1: 72.902, Top5: 91.482
200-NN classifier result: Top1: 72.062, Top5: 91.546

ViT-S/16 fp16

git checkout vihang/eval_half
python ./eval_linear.py --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --evaluate --patch_size=16
Model vit_small built.
We load the reference pretrained linear weights.
Test:  [  0/391]  eta: 0:12:20  loss: 0.380615 (0.380615)  acc1: 92.968750 (92.968750)  acc5: 98.437500 (98.437500)  time: 1.892996  data: 1.365466  max mem: 428
Test:  [ 20/391]  eta: 0:00:56  loss: 0.560059 (0.560416)  acc1: 84.375000 (85.193452)  acc5: 96.875000 (96.837798)  time: 0.065381  data: 0.033924  max mem: 502
Test:  [ 40/391]  eta: 0:00:43  loss: 0.504395 (0.601469)  acc1: 84.375000 (84.032012)  acc5: 97.656250 (96.455793)  time: 0.092914  data: 0.063916  max mem: 502
Test:  [ 60/391]  eta: 0:00:37  loss: 0.502441 (0.564628)  acc1: 87.500000 (85.245902)  acc5: 97.656250 (96.695697)  time: 0.088490  data: 0.056156  max mem: 502
Test:  [ 80/391]  eta: 0:00:32  loss: 0.716309 (0.618588)  acc1: 79.687500 (83.497299)  acc5: 96.093750 (96.508488)  time: 0.087520  data: 0.056728  max mem: 502
Test:  [100/391]  eta: 0:00:32  loss: 0.624023 (0.628162)  acc1: 85.156250 (83.369431)  acc5: 96.093750 (96.449567)  time: 0.142970  data: 0.106644  max mem: 502
Test:  [120/391]  eta: 0:00:38  loss: 0.532715 (0.632407)  acc1: 83.593750 (83.283833)  acc5: 96.875000 (96.532800)  time: 0.290818  data: 0.251886  max mem: 502
Test:  [140/391]  eta: 0:00:33  loss: 0.434326 (0.618959)  acc1: 89.843750 (83.610372)  acc5: 97.656250 (96.642287)  time: 0.084812  data: 0.056365  max mem: 502
Test:  [160/391]  eta: 0:00:29  loss: 0.679688 (0.630884)  acc1: 81.250000 (83.438470)  acc5: 95.312500 (96.472244)  time: 0.090619  data: 0.059577  max mem: 502
Test:  [180/391]  eta: 0:00:26  loss: 1.135742 (0.689190)  acc1: 69.531250 (81.970822)  acc5: 90.625000 (95.804558)  time: 0.091500  data: 0.063739  max mem: 502
Test:  [200/391]  eta: 0:00:23  loss: 1.276367 (0.747667)  acc1: 64.062500 (80.600902)  acc5: 90.625000 (95.122046)  time: 0.100896  data: 0.070123  max mem: 502
Test:  [220/391]  eta: 0:00:20  loss: 1.095703 (0.774410)  acc1: 73.437500 (80.048077)  acc5: 91.406250 (94.771635)  time: 0.092003  data: 0.062661  max mem: 502
Test:  [240/391]  eta: 0:00:17  loss: 0.752930 (0.788651)  acc1: 79.687500 (79.830135)  acc5: 92.968750 (94.511800)  time: 0.094443  data: 0.063852  max mem: 502
Test:  [260/391]  eta: 0:00:15  loss: 1.212891 (0.828628)  acc1: 64.843750 (78.744612)  acc5: 90.625000 (94.064296)  time: 0.093591  data: 0.065633  max mem: 502
Test:  [280/391]  eta: 0:00:12  loss: 1.028320 (0.848688)  acc1: 74.218750 (78.380783)  acc5: 92.187500 (93.808385)  time: 0.107170  data: 0.071783  max mem: 502
Test:  [300/391]  eta: 0:00:10  loss: 1.108398 (0.867766)  acc1: 72.656250 (77.966674)  acc5: 89.843750 (93.513808)  time: 0.096569  data: 0.064147  max mem: 502
Test:  [320/391]  eta: 0:00:08  loss: 1.179688 (0.883463)  acc1: 71.875000 (77.579829)  acc5: 88.281250 (93.251071)  time: 0.105090  data: 0.072960  max mem: 502
Test:  [340/391]  eta: 0:00:05  loss: 1.224609 (0.904156)  acc1: 67.187500 (77.050495)  acc5: 89.062500 (93.032900)  time: 0.094860  data: 0.061611  max mem: 502
Test:  [360/391]  eta: 0:00:03  loss: 0.979492 (0.914785)  acc1: 75.000000 (76.813539)  acc5: 91.406250 (92.949273)  time: 0.096149  data: 0.064149  max mem: 502
Test:  [380/391]  eta: 0:00:01  loss: 0.813477 (0.913679)  acc1: 76.562500 (76.849573)  acc5: 93.750000 (92.950295)  time: 0.099188  data: 0.069708  max mem: 502
Test:  [390/391]  eta: 0:00:00  loss: 0.726074 (0.911055)  acc1: 76.562500 (76.906000)  acc5: 93.750000 (92.980000)  time: 0.096989  data: 0.064411  max mem: 502
Test: Total time: 0:00:43 (0.110753 s / it)
* Acc@1 76.906 Acc@5 92.980 loss 0.911
Accuracy of the network on the 50000 test images: 76.9%

ViT-S/14 DinoV2

git checkout vihang/eval_v2
python ./eval_linear.py --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --evaluate
Model vit_small built.
Test:  [  0/391]  eta: 0:18:37  loss: 0.287185 (0.287185)  acc1: 93.750000 (93.750000)  acc5: 98.437500 (98.437500)  time: 2.858557  data: 1.847843  max mem: 965
Test:  [ 20/391]  eta: 0:01:33  loss: 0.449175 (0.454168)  acc1: 87.500000 (87.723214)  acc5: 97.656250 (97.953869)  time: 0.121201  data: 0.000267  max mem: 966
Test:  [ 40/391]  eta: 0:01:03  loss: 0.399514 (0.491631)  acc1: 87.500000 (86.432927)  acc5: 97.656250 (97.618140)  time: 0.107072  data: 0.000223  max mem: 966
Test:  [ 60/391]  eta: 0:00:51  loss: 0.369379 (0.460332)  acc1: 89.843750 (87.435963)  acc5: 98.437500 (97.745902)  time: 0.107282  data: 0.000181  max mem: 966
Test:  [ 80/391]  eta: 0:00:44  loss: 0.691794 (0.530860)  acc1: 81.250000 (85.648148)  acc5: 96.093750 (97.289738)  time: 0.106931  data: 0.000173  max mem: 966
Test:  [100/391]  eta: 0:00:39  loss: 0.607871 (0.549595)  acc1: 85.937500 (85.295483)  acc5: 96.875000 (97.145730)  time: 0.107492  data: 0.000128  max mem: 966
Test:  [120/391]  eta: 0:00:35  loss: 0.616540 (0.561549)  acc1: 83.593750 (85.040031)  acc5: 97.656250 (97.113895)  time: 0.107259  data: 0.000171  max mem: 966
Test:  [140/391]  eta: 0:00:32  loss: 0.387987 (0.549117)  acc1: 87.500000 (85.250443)  acc5: 98.437500 (97.279477)  time: 0.107170  data: 0.000180  max mem: 966
Test:  [160/391]  eta: 0:00:29  loss: 0.590641 (0.559286)  acc1: 84.375000 (85.170807)  acc5: 96.875000 (97.190411)  time: 0.107158  data: 0.000197  max mem: 966
Test:  [180/391]  eta: 0:00:26  loss: 0.819142 (0.592688)  acc1: 77.343750 (84.189399)  acc5: 94.531250 (96.900898)  time: 0.111123  data: 0.000660  max mem: 966
Test:  [200/391]  eta: 0:00:24  loss: 0.910006 (0.629575)  acc1: 71.093750 (83.290578)  acc5: 92.968750 (96.536847)  time: 0.167249  data: 0.050212  max mem: 966
Test:  [220/391]  eta: 0:00:23  loss: 0.771277 (0.641363)  acc1: 78.906250 (83.070560)  acc5: 95.312500 (96.440187)  time: 0.251233  data: 0.136657  max mem: 966
Test:  [240/391]  eta: 0:00:22  loss: 0.604386 (0.646282)  acc1: 82.031250 (82.977827)  acc5: 96.875000 (96.297977)  time: 0.215593  data: 0.095335  max mem: 966
Test:  [260/391]  eta: 0:00:18  loss: 0.910095 (0.676159)  acc1: 74.218750 (82.100096)  acc5: 93.750000 (96.012931)  time: 0.113562  data: 0.000376  max mem: 966
Test:  [280/391]  eta: 0:00:15  loss: 0.777899 (0.686569)  acc1: 78.906250 (81.897798)  acc5: 94.531250 (95.915814)  time: 0.122285  data: 0.000526  max mem: 966
Test:  [300/391]  eta: 0:00:12  loss: 0.817981 (0.694610)  acc1: 77.343750 (81.701620)  acc5: 94.531250 (95.790075)  time: 0.117873  data: 0.000669  max mem: 966
Test:  [320/391]  eta: 0:00:10  loss: 0.937429 (0.705814)  acc1: 76.562500 (81.466608)  acc5: 92.968750 (95.614291)  time: 0.191644  data: 0.070490  max mem: 966
Test:  [340/391]  eta: 0:00:07  loss: 0.908184 (0.720872)  acc1: 76.562500 (81.052969)  acc5: 92.968750 (95.456837)  time: 0.126619  data: 0.001506  max mem: 966
Test:  [360/391]  eta: 0:00:04  loss: 0.798275 (0.728798)  acc1: 78.906250 (80.825831)  acc5: 94.531250 (95.409886)  time: 0.213799  data: 0.093423  max mem: 966
Test:  [380/391]  eta: 0:00:01  loss: 0.649764 (0.727318)  acc1: 82.812500 (80.868602)  acc5: 96.093750 (95.421178)  time: 0.134691  data: 0.009982  max mem: 966
Test:  [390/391]  eta: 0:00:00  loss: 0.589799 (0.725904)  acc1: 80.468750 (80.892000)  acc5: 96.875000 (95.444000)  time: 0.129553  data: 0.002552  max mem: 966
Test: Total time: 0:00:57 (0.146009 s / it)
* Acc@1 80.892 Acc@5 95.444 loss 0.726
Accuracy of the network on the 50000 test images: 80.9%

ViT-Ti/16

Trained manually on TinyImageNet and separately on ImageNet.

To train the Linear Classifier head:

python ./eval_linear.py --pretrained_weights=weights/dino_vitti_tinyimageimagenet.pth --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --arch=vit_tiny
Test:  [  0/391]  eta: 0:07:00  loss: 0.651548 (0.651548)  acc1: 85.937500 (85.937500)  acc5: 95.312500 (95.312500)  time: 1.076122  data: 1.051785  max mem: 532
Test:  [ 20/391]  eta: 0:00:44  loss: 1.109726 (1.040032)  acc1: 70.312500 (74.776786)  acc5: 91.406250 (91.592262)  time: 0.071502  data: 0.047663  max mem: 532
Test:  [ 40/391]  eta: 0:00:33  loss: 0.954309 (1.054288)  acc1: 75.000000 (73.875762)  acc5: 92.187500 (91.310976)  time: 0.070326  data: 0.046484  max mem: 532
Test:  [ 60/391]  eta: 0:00:28  loss: 0.960422 (1.033441)  acc1: 77.343750 (74.871926)  acc5: 91.406250 (91.214139)  time: 0.067605  data: 0.043781  max mem: 532
Test:  [ 80/391]  eta: 0:00:25  loss: 1.427128 (1.137124)  acc1: 64.843750 (71.595293)  acc5: 89.843750 (90.509259)  time: 0.063824  data: 0.039945  max mem: 532
Test:  [100/391]  eta: 0:00:22  loss: 1.254161 (1.151063)  acc1: 67.187500 (71.008663)  acc5: 91.406250 (90.756498)  time: 0.062968  data: 0.039105  max mem: 532
Test:  [120/391]  eta: 0:00:20  loss: 1.045764 (1.137343)  acc1: 72.656250 (71.222882)  acc5: 92.968750 (90.993027)  time: 0.075784  data: 0.051893  max mem: 532
Test:  [140/391]  eta: 0:00:18  loss: 0.839696 (1.111849)  acc1: 78.906250 (71.875000)  acc5: 93.750000 (91.240027)  time: 0.067497  data: 0.043587  max mem: 532
Test:  [160/391]  eta: 0:00:17  loss: 1.228135 (1.119752)  acc1: 69.531250 (71.841033)  acc5: 89.062500 (91.071429)  time: 0.083286  data: 0.059484  max mem: 532
Test:  [180/391]  eta: 0:00:15  loss: 1.924166 (1.208384)  acc1: 53.906250 (70.057838)  acc5: 80.468750 (89.796271)  time: 0.061169  data: 0.037340  max mem: 532
Test:  [200/391]  eta: 0:00:14  loss: 2.029323 (1.289078)  acc1: 50.000000 (68.547886)  acc5: 76.562500 (88.576648)  time: 0.059261  data: 0.035410  max mem: 532
Test:  [220/391]  eta: 0:00:12  loss: 1.574746 (1.330794)  acc1: 58.593750 (67.799067)  acc5: 82.031250 (87.878252)  time: 0.067436  data: 0.043592  max mem: 532
Test:  [240/391]  eta: 0:00:10  loss: 1.518941 (1.361375)  acc1: 65.625000 (67.307443)  acc5: 82.812500 (87.344398)  time: 0.068633  data: 0.044742  max mem: 532
Test:  [260/391]  eta: 0:00:09  loss: 1.940958 (1.416640)  acc1: 54.687500 (66.133860)  acc5: 79.687500 (86.599018)  time: 0.064766  data: 0.040891  max mem: 532
Test:  [280/391]  eta: 0:00:07  loss: 1.810320 (1.449457)  acc1: 54.687500 (65.474867)  acc5: 81.250000 (86.129337)  time: 0.065093  data: 0.041177  max mem: 532
Test:  [300/391]  eta: 0:00:06  loss: 1.754879 (1.483069)  acc1: 54.687500 (64.934593)  acc5: 79.687500 (85.623443)  time: 0.060532  data: 0.036658  max mem: 532
Test:  [320/391]  eta: 0:00:04  loss: 2.125113 (1.509535)  acc1: 53.906250 (64.449474)  acc5: 76.562500 (85.197625)  time: 0.060982  data: 0.037146  max mem: 532
Test:  [340/391]  eta: 0:00:03  loss: 1.971048 (1.541902)  acc1: 50.781250 (63.744043)  acc5: 73.437500 (84.702621)  time: 0.067219  data: 0.043330  max mem: 532
Test:  [360/391]  eta: 0:00:02  loss: 1.771336 (1.558294)  acc1: 54.687500 (63.408934)  acc5: 79.687500 (84.452909)  time: 0.064617  data: 0.040770  max mem: 532
Test:  [380/391]  eta: 0:00:00  loss: 1.410041 (1.554849)  acc1: 64.062500 (63.447343)  acc5: 88.281250 (84.524688)  time: 0.069729  data: 0.045909  max mem: 532
Test:  [390/391]  eta: 0:00:00  loss: 1.231684 (1.546354)  acc1: 69.531250 (63.666000)  acc5: 89.062500 (84.648000)  time: 0.060161  data: 0.036815  max mem: 532
Test: Total time: 0:00:27 (0.069449 s / it)
* Acc@1 63.666 Acc@5 84.648 loss 1.546
Accuracy at epoch 99 of the network on the 50000 test images: 63.7%
Max accuracy so far: 63.74%
Training of the supervised linear classifier on frozen features completed.
Top-1 test accuracy: 63.7

EfficientNet-B0

python ./eval_linear.py --arch=efficientnet_b0 --data_path=/datasets/imagenet/ILSVRC/Data/CLS-LOC/ --output_dir=out/eval_linear/efficientnetb0_16_imagenet1k --pretrained_weights=out/dino_training/efficientnetb0_16_imagenet1k/checkpoint.pth
Test:  [  0/131]  eta: 0:09:31  loss: 1.065596 (1.065596)  acc1: 75.000000 (75.000000)  acc5: 92.187500 (92.187500)  time: 4.362770  data: 4.268591  max mem: 4362
Test:  [ 20/131]  eta: 0:00:32  loss: 1.234613 (1.238226)  acc1: 70.312500 (69.072423)  acc5: 88.541672 (89.471728)  time: 0.093271  data: 0.000208  max mem: 4362
Test:  [ 40/131]  eta: 0:00:17  loss: 1.640278 (1.417445)  acc1: 57.552086 (63.744921)  acc5: 85.677086 (87.950968)  time: 0.093349  data: 0.000271  max mem: 4362
Test:  [ 60/131]  eta: 0:00:11  loss: 1.591256 (1.449136)  acc1: 62.239586 (63.618513)  acc5: 83.593750 (87.218240)  time: 0.093471  data: 0.000286  max mem: 4362
Test:  [ 80/131]  eta: 0:00:07  loss: 1.924977 (1.561640)  acc1: 54.427086 (61.905223)  acc5: 78.125000 (85.169113)  time: 0.093379  data: 0.000252  max mem: 4362
Test:  [100/131]  eta: 0:00:04  loss: 2.067794 (1.660396)  acc1: 52.083336 (60.205241)  acc5: 77.083336 (83.632428)  time: 0.093135  data: 0.000119  max mem: 4362
Test:  [120/131]  eta: 0:00:01  loss: 1.980305 (1.712500)  acc1: 53.906250 (59.276000)  acc5: 76.041672 (82.741480)  time: 0.093003  data: 0.000050  max mem: 4362
Test:  [130/131]  eta: 0:00:00  loss: 1.825498 (1.703395)  acc1: 58.072918 (59.678002)  acc5: 81.510422 (83.006003)  time: 0.089193  data: 0.000046  max mem: 4362
Test: Total time: 0:00:16 (0.126527 s / it)
* Acc@1 59.678 Acc@5 83.006 loss 1.703
Accuracy at epoch 99 of the network on the 50000 test images: 59.7%
Max accuracy so far: 59.69%
Training of the supervised linear classifier on frozen features completed.
Top-1 test accuracy: 59.7