Skip to content

Commit

Permalink
Add support for ConvNeXT (V1+V2) models (#428)
Browse files Browse the repository at this point in the history
* Add support for `convnext` and `convnextv2` models

* Fix typo
  • Loading branch information
xenova authored Dec 2, 2023
1 parent 3da3841 commit 6f05572
Show file tree
Hide file tree
Showing 5 changed files with 93 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,8 @@ You can refine your search by selecting the task you're interested in (e.g., [te
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
Expand Down
2 changes: 2 additions & 0 deletions docs/snippets/6_supported-models.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
Expand Down
39 changes: 39 additions & 0 deletions scripts/supported_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,45 @@
'Salesforce/codegen-350M-multi',
'Salesforce/codegen-350M-nl',
],
'convnext':[
# Image classification
'facebook/convnext-tiny-224',
'facebook/convnext-small-224',
'facebook/convnext-base-224',
'facebook/convnext-base-224-22k',
'facebook/convnext-base-224-22k-1k',
'facebook/convnext-base-384',
'facebook/convnext-base-384-22k-1k',
'facebook/convnext-large-224',
'facebook/convnext-large-224-22k',
'facebook/convnext-large-224-22k-1k',
'facebook/convnext-large-384',
'facebook/convnext-large-384-22k-1k',
'facebook/convnext-xlarge-224-22k',
'facebook/convnext-xlarge-224-22k-1k',
'facebook/convnext-xlarge-384-22k-1k',
],
'convnextv2':[
# Image classification
'facebook/convnextv2-atto-1k-224',
'facebook/convnextv2-femto-1k-224',
'facebook/convnextv2-pico-1k-224',
'facebook/convnextv2-tiny-1k-224',
'facebook/convnextv2-tiny-22k-384',
'facebook/convnextv2-tiny-22k-224',
'facebook/convnextv2-nano-1k-224',
'facebook/convnextv2-nano-22k-384',
'facebook/convnextv2-base-22k-224',
'facebook/convnextv2-base-1k-224',
'facebook/convnextv2-base-22k-384',
'facebook/convnextv2-large-22k-224',
'facebook/convnextv2-large-1k-224',
'facebook/convnextv2-large-22k-384',
# 'facebook/convnextv2-huge-22k-512',
# 'facebook/convnextv2-huge-1k-224',
# 'facebook/convnextv2-huge-22k-384',
# 'facebook/convnextv2-nano-22k-224',
],
'deberta': [
# Zero-shot classification
'cross-encoder/nli-deberta-base',
Expand Down
48 changes: 48 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -3545,6 +3545,50 @@ export class DonutSwinPreTrainedModel extends PreTrainedModel { }
export class DonutSwinModel extends DonutSwinPreTrainedModel { }
//////////////////////////////////////////////////


//////////////////////////////////////////////////
export class ConvNextPreTrainedModel extends PreTrainedModel { }

/**
* The bare ConvNext model outputting raw features without any specific head on top.
*/
export class ConvNextModel extends ConvNextPreTrainedModel { }

/**
* ConvNext Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for ImageNet.
*/
export class ConvNextForImageClassification extends ConvNextPreTrainedModel {
/**
* @param {any} model_inputs
*/
async _call(model_inputs) {
return new SequenceClassifierOutput(await super._call(model_inputs));
}
}
//////////////////////////////////////////////////


//////////////////////////////////////////////////
export class ConvNextV2PreTrainedModel extends PreTrainedModel { }

/**
* The bare ConvNextV2 model outputting raw features without any specific head on top.
*/
export class ConvNextV2Model extends ConvNextV2PreTrainedModel { }

/**
* ConvNextV2 Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for ImageNet.
*/
export class ConvNextV2ForImageClassification extends ConvNextV2PreTrainedModel {
/**
* @param {any} model_inputs
*/
async _call(model_inputs) {
return new SequenceClassifierOutput(await super._call(model_inputs));
}
}
//////////////////////////////////////////////////

//////////////////////////////////////////////////
export class YolosPreTrainedModel extends PreTrainedModel { }
export class YolosModel extends YolosPreTrainedModel { }
Expand Down Expand Up @@ -4114,6 +4158,8 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
['owlvit', ['OwlViTModel', OwlViTModel]],
['beit', ['BeitModel', BeitModel]],
['deit', ['DeiTModel', DeiTModel]],
['convnext', ['ConvNextModel', ConvNextModel]],
['convnextv2', ['ConvNextV2Model', ConvNextV2Model]],
['resnet', ['ResNetModel', ResNetModel]],
['swin', ['SwinModel', SwinModel]],
['swin2sr', ['Swin2SRModel', Swin2SRModel]],
Expand Down Expand Up @@ -4266,6 +4312,8 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
['mobilevit', ['MobileViTForImageClassification', MobileViTForImageClassification]],
['beit', ['BeitForImageClassification', BeitForImageClassification]],
['deit', ['DeiTForImageClassification', DeiTForImageClassification]],
['convnext', ['ConvNextForImageClassification', ConvNextForImageClassification]],
['convnextv2', ['ConvNextV2ForImageClassification', ConvNextV2ForImageClassification]],
['resnet', ['ResNetForImageClassification', ResNetForImageClassification]],
['swin', ['SwinForImageClassification', SwinForImageClassification]],
]);
Expand Down
2 changes: 2 additions & 0 deletions src/processors.js
Original file line number Diff line number Diff line change
Expand Up @@ -592,6 +592,7 @@ export class DPTFeatureExtractor extends ImageFeatureExtractor { }
export class GLPNFeatureExtractor extends ImageFeatureExtractor { }
export class CLIPFeatureExtractor extends ImageFeatureExtractor { }
export class ConvNextFeatureExtractor extends ImageFeatureExtractor { }
export class ConvNextImageProcessor extends ConvNextFeatureExtractor { } // NOTE extends ConvNextFeatureExtractor
export class ViTFeatureExtractor extends ImageFeatureExtractor { }
export class MobileViTFeatureExtractor extends ImageFeatureExtractor { }
export class OwlViTFeatureExtractor extends ImageFeatureExtractor {
Expand Down Expand Up @@ -1645,6 +1646,7 @@ export class AutoProcessor {
OwlViTFeatureExtractor,
CLIPFeatureExtractor,
ConvNextFeatureExtractor,
ConvNextImageProcessor,
DPTFeatureExtractor,
GLPNFeatureExtractor,
BeitFeatureExtractor,
Expand Down

0 comments on commit 6f05572

Please sign in to comment.