Skip to content

Latest commit

 

History

History
 
 

vit_L_16

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Vit-large-patch16-384

Description

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. For more information see google/vit-large-patch16-384.

Model

Model Download
vit-large-patch16-384 1.13 GB

Dataset

ImageNet-21k

ImageNet

References

License

Apache License 2.0