You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Qwen2-VL is supported (images at least, not video just yet) on the dev branch. NVLM-D looks interesting, and I might consider it next, once Qwen2-VL support is complete.
Problem
Hello,
I'm very pleased to see exllama getting vision capabilities for the first time with Pixtral!
You hinted at supporting new models in the release notes. What models are you hopping to support?
Solution
If I may suggest a few ideas, Qwen based vision models are the SOTA as of writing. Support for Qwen2-VL and/or NVML-D could be a huge step forward
Alternatives
No response
Explanation
Support for either of these beasts
https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct
https://huggingface.co/nvidia/NVLM-D-72B
Examples
No response
Additional context
Forgot to mention that the Qwen VL model family offers multiple sizes (2B, 7B, 72B), which could be convenient for the GPU poor community.
Acknowledgements
The text was updated successfully, but these errors were encountered: