-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support for phi-3-vision-128k-instruct
#28
Comments
Hey @JosefAlbers Thank you! Awesome, that model is on the roadmap after Paligemma #24. Please feel free to submit a PR to support it :) |
Paligemma is done, thanks! Do you want to take on Phi-3-vision? |
Yes, I'd love to! Just a heads-up, I'm new to mlx, so I might need a little guidance along the way. |
No problem, I'm here to help :) |
Is there a list of officially supported models? |
@ChristianWeyer not yet. But at the moment we support the following archictetures:
|
There are still many more to add. |
Which high quality Llava model can we use - any recommendations (from HF)? |
Thx. These are not good enough for our use cases ;-). |
Could you please open a new issue and explain your use case? |
@Blaizzy, I have a working demo of Phi-3-vision support for MLX: https://github.com/JosefAlbers/Phi-3-Vision-MLX It handles texts and image inputs, generating expected outputs. With the new Su-scaled RoPE, it seems to work reasonably well even with extremely long contexts. Just a heads-up for now. I'll circle back when it's more polished and ready for feedback. |
I love the speed! Awesome, looking forward to the polished version :) |
@Blaizzy Thanks so much, I've learned a ton about MLX and VLMs by studying the well written and documented codes in your repo. I'll keep you posted on my progress and will definitely reach out when I have a more polished version ready for your feedback! |
Most welcome! I'm happy I could be of help, Let me know when you ready. |
You guys are heroes! |
@Blaizzy, I'd really appreciate it! I'm just about to start working on a PR for adding su-RoPE support to In the meantime, I've been experimenting the model with various inputs and LLM/VLM techniques in my own repo, and am really amazed by how well it handles both text and image prompts. I'm excited to get your feedback! @lin72h, thanks a lot! |
Most welcome, it's my pleasure!
@JosefAlbers Why do the round trip? When we can have it here. Note: |
@Blaizzy Right, I will see if I can port the phi3_v into the mlx_vlm today. |
Hi, I've been exploring this repo for the past couple of days and I find your work here really amazing. I'm curious if there are any plans to add support for the Phi-3-vision-128k-instruct model to this library? I'd be happy to contribute in any way I can to help make this happen.
The text was updated successfully, but these errors were encountered: