Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
nntc.config.yaml		nntc.config.yaml
vit_L_16.onnx		vit_L_16.onnx

README.md

Vit-large-patch16-384

Description

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. For more information see google/vit-large-patch16-384.

Model

Model	Download
vit-large-patch16-384	1.13 GB

Dataset

ImageNet-21k

ImageNet

References

Vision Transformer (ViT) model was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) model first released in this repository. Howerver, the weights were converted from the timm repository

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vit_L_16

vit_L_16

README.md

Vit-large-patch16-384

Description

Model

Dataset

References

License

Files

vit_L_16

Directory actions

More options

Directory actions

More options

Latest commit

History

vit_L_16

Folders and files

parent directory

README.md

Vit-large-patch16-384

Description

Model

Dataset

References

License