Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for phi-3-vision-128k-instruct #28

Closed
JosefAlbers opened this issue May 23, 2024 · 19 comments
Closed

Feature Request: Support for phi-3-vision-128k-instruct #28

JosefAlbers opened this issue May 23, 2024 · 19 comments

Comments

@JosefAlbers
Copy link
Contributor

Hi, I've been exploring this repo for the past couple of days and I find your work here really amazing. I'm curious if there are any plans to add support for the Phi-3-vision-128k-instruct model to this library? I'd be happy to contribute in any way I can to help make this happen.

@Blaizzy
Copy link
Owner

Blaizzy commented May 23, 2024

Hey @JosefAlbers

Thank you!

Awesome, that model is on the roadmap after Paligemma #24.

Please feel free to submit a PR to support it :)

@Blaizzy
Copy link
Owner

Blaizzy commented May 24, 2024

@JosefAlbers

Paligemma is done, thanks!

Do you want to take on Phi-3-vision?

@JosefAlbers
Copy link
Contributor Author

Yes, I'd love to! Just a heads-up, I'm new to mlx, so I might need a little guidance along the way.

@Blaizzy
Copy link
Owner

Blaizzy commented May 25, 2024

No problem, I'm here to help :)

@ChristianWeyer
Copy link

@JosefAlbers

Paligemma is done, thanks!

Do you want to take on Phi-3-vision?

Is there a list of officially supported models?

@Blaizzy
Copy link
Owner

Blaizzy commented May 25, 2024

@ChristianWeyer not yet.

But at the moment we support the following archictetures:

  • Llava (Clip + Llama)
  • Paligemma (Siglip + Gemma)
  • Idefics2 (Siglip + Mistral)
  • NanoLlava (Siglip + Qwen2)

@Blaizzy
Copy link
Owner

Blaizzy commented May 25, 2024

There are still many more to add.

@ChristianWeyer
Copy link

@ChristianWeyer not yet.

But at the moment we support the following archictetures:

  • Llava (Clip + Llama)
  • Paligemma (Siglip + Gemma)
  • Idefics2 (Siglip + Mistral)
  • NanoLlava (Siglip + Qwen2)

Which high quality Llava model can we use - any recommendations (from HF)?

@Blaizzy
Copy link
Owner

Blaizzy commented May 25, 2024

Here you go:

https://huggingface.co/mlx-community?search_models=llava

@ChristianWeyer
Copy link

Thx. These are not good enough for our use cases ;-).

@Blaizzy
Copy link
Owner

Blaizzy commented May 25, 2024

Could you please open a new issue and explain your use case?

@JosefAlbers
Copy link
Contributor Author

@Blaizzy, I have a working demo of Phi-3-vision support for MLX: https://github.com/JosefAlbers/Phi-3-Vision-MLX

It handles texts and image inputs, generating expected outputs. With the new Su-scaled RoPE, it seems to work reasonably well even with extremely long contexts.

Just a heads-up for now. I'll circle back when it's more polished and ready for feedback.

@Blaizzy
Copy link
Owner

Blaizzy commented May 27, 2024

I love the speed!

Awesome, looking forward to the polished version :)

@JosefAlbers
Copy link
Contributor Author

@Blaizzy Thanks so much, I've learned a ton about MLX and VLMs by studying the well written and documented codes in your repo. I'll keep you posted on my progress and will definitely reach out when I have a more polished version ready for your feedback!

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 3, 2024

Most welcome!

I'm happy I could be of help,

Let me know when you ready.

@lin72h
Copy link

lin72h commented Jun 4, 2024

You guys are heroes!

@JosefAlbers
Copy link
Contributor Author

@Blaizzy, I'd really appreciate it! I'm just about to start working on a PR for adding su-RoPE support to mlx-lm. Once that is merged, I think I can craft a version of the phi-3-vision that can fit seamlessly into the mlx-vlm framework.

In the meantime, I've been experimenting the model with various inputs and LLM/VLM techniques in my own repo, and am really amazed by how well it handles both text and image prompts. I'm excited to get your feedback!

@lin72h, thanks a lot!

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 4, 2024

Most welcome, it's my pleasure!

I'm just about to start working on a PR for adding su-RoPE support to mlx-lm. Once that is merged,

@JosefAlbers Why do the round trip? When we can have it here.

Note: mlx-lm is only for language models, thus the lm. Unless there are other language models that use su-RoPE it's not going to be merged.

@JosefAlbers
Copy link
Contributor Author

@JosefAlbers Why do the round trip? When we can have it here.

Note: mlx-lm is only for language models, thus the lm. Unless there are other language models that use su-RoPE it's not going to be merged.

@Blaizzy Right, I will see if I can port the phi3_v into the mlx_vlm today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants