Image Classification #11
Labels
downstream
Changes code wrapping the core model
ML
Requires machine-learning knowledge (can be built up on the fly)
research
Creative project that might fail but could give high returns
At the moment, we have a novel architecture that's very powerful in language modelling. However, we don't know whether it will transfer as well to other domains as the transformer. That's why it'd be interesting to test its versatility by training it on ImageNet.
This issue is about implementing the input projection for image tokens (as in ViT), the necessary data pipelines and testing the model on this new modality.
The text was updated successfully, but these errors were encountered: