Feature Request: Support for `phi-3-vision-128k-instruct` #28

JosefAlbers · 2024-05-23T10:10:44Z

Hi, I've been exploring this repo for the past couple of days and I find your work here really amazing. I'm curious if there are any plans to add support for the Phi-3-vision-128k-instruct model to this library? I'd be happy to contribute in any way I can to help make this happen.

Blaizzy · 2024-05-23T15:02:58Z

Hey @JosefAlbers

Thank you!

Awesome, that model is on the roadmap after Paligemma #24.

Please feel free to submit a PR to support it :)

Blaizzy · 2024-05-24T23:19:53Z

@JosefAlbers

Paligemma is done, thanks!

Do you want to take on Phi-3-vision?

JosefAlbers · 2024-05-25T14:08:36Z

Yes, I'd love to! Just a heads-up, I'm new to mlx, so I might need a little guidance along the way.

Blaizzy · 2024-05-25T15:06:08Z

No problem, I'm here to help :)

ChristianWeyer · 2024-05-25T15:21:32Z

@JosefAlbers

Paligemma is done, thanks!

Do you want to take on Phi-3-vision?

Is there a list of officially supported models?

Blaizzy · 2024-05-25T15:32:53Z

@ChristianWeyer not yet.

But at the moment we support the following archictetures:

Llava (Clip + Llama)
Paligemma (Siglip + Gemma)
Idefics2 (Siglip + Mistral)
NanoLlava (Siglip + Qwen2)

Blaizzy · 2024-05-25T15:33:31Z

There are still many more to add.

ChristianWeyer · 2024-05-25T15:52:41Z

@ChristianWeyer not yet.

But at the moment we support the following archictetures:

Llava (Clip + Llama)

Paligemma (Siglip + Gemma)

Idefics2 (Siglip + Mistral)

NanoLlava (Siglip + Qwen2)

Which high quality Llava model can we use - any recommendations (from HF)?

Blaizzy · 2024-05-25T16:41:46Z

Here you go:

https://huggingface.co/mlx-community?search_models=llava

ChristianWeyer · 2024-05-25T17:02:23Z

Thx. These are not good enough for our use cases ;-).

Blaizzy · 2024-05-25T17:28:00Z

Could you please open a new issue and explain your use case?

JosefAlbers · 2024-05-27T19:05:07Z

@Blaizzy, I have a working demo of Phi-3-vision support for MLX: https://github.com/JosefAlbers/Phi-3-Vision-MLX

It handles texts and image inputs, generating expected outputs. With the new Su-scaled RoPE, it seems to work reasonably well even with extremely long contexts.

Just a heads-up for now. I'll circle back when it's more polished and ready for feedback.

Blaizzy · 2024-05-27T19:29:40Z

I love the speed!

Awesome, looking forward to the polished version :)

JosefAlbers · 2024-05-28T02:26:40Z

@Blaizzy Thanks so much, I've learned a ton about MLX and VLMs by studying the well written and documented codes in your repo. I'll keep you posted on my progress and will definitely reach out when I have a more polished version ready for your feedback!

Blaizzy · 2024-06-03T18:30:35Z

Most welcome!

I'm happy I could be of help,

Let me know when you ready.

lin72h · 2024-06-04T04:26:01Z

You guys are heroes!

JosefAlbers · 2024-06-04T07:29:23Z

@Blaizzy, I'd really appreciate it! I'm just about to start working on a PR for adding su-RoPE support to mlx-lm. Once that is merged, I think I can craft a version of the phi-3-vision that can fit seamlessly into the mlx-vlm framework.

In the meantime, I've been experimenting the model with various inputs and LLM/VLM techniques in my own repo, and am really amazed by how well it handles both text and image prompts. I'm excited to get your feedback!

@lin72h, thanks a lot!

Blaizzy · 2024-06-04T08:08:51Z

Most welcome, it's my pleasure!

I'm just about to start working on a PR for adding su-RoPE support to mlx-lm. Once that is merged,

@JosefAlbers Why do the round trip? When we can have it here.

Note: mlx-lm is only for language models, thus the lm. Unless there are other language models that use su-RoPE it's not going to be merged.

JosefAlbers · 2024-06-05T07:50:31Z

@JosefAlbers Why do the round trip? When we can have it here.

Note: mlx-lm is only for language models, thus the lm. Unless there are other language models that use su-RoPE it's not going to be merged.

@Blaizzy Right, I will see if I can port the phi3_v into the mlx_vlm today.

Blaizzy mentioned this issue May 25, 2024

Inconsistent OCR results with Idefics2 model in mlx_vlm compared to other environments #30

Closed

JosefAlbers mentioned this issue Jun 5, 2024

Add support for phi-3-vision-128k-instruct #36

Merged

Blaizzy closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for `phi-3-vision-128k-instruct` #28

Feature Request: Support for `phi-3-vision-128k-instruct` #28

JosefAlbers commented May 23, 2024

Blaizzy commented May 23, 2024

Blaizzy commented May 24, 2024

JosefAlbers commented May 25, 2024

Blaizzy commented May 25, 2024

ChristianWeyer commented May 25, 2024

Blaizzy commented May 25, 2024

Blaizzy commented May 25, 2024

ChristianWeyer commented May 25, 2024

Blaizzy commented May 25, 2024

ChristianWeyer commented May 25, 2024

Blaizzy commented May 25, 2024 •

edited

Loading

JosefAlbers commented May 27, 2024

Blaizzy commented May 27, 2024

JosefAlbers commented May 28, 2024

Blaizzy commented Jun 3, 2024

lin72h commented Jun 4, 2024

JosefAlbers commented Jun 4, 2024

Blaizzy commented Jun 4, 2024 •

edited

Loading

JosefAlbers commented Jun 5, 2024

Feature Request: Support for phi-3-vision-128k-instruct #28

Feature Request: Support for phi-3-vision-128k-instruct #28

Comments

JosefAlbers commented May 23, 2024

Blaizzy commented May 23, 2024

Blaizzy commented May 24, 2024

JosefAlbers commented May 25, 2024

Blaizzy commented May 25, 2024

ChristianWeyer commented May 25, 2024

Blaizzy commented May 25, 2024

Blaizzy commented May 25, 2024

ChristianWeyer commented May 25, 2024

Blaizzy commented May 25, 2024

ChristianWeyer commented May 25, 2024

Blaizzy commented May 25, 2024 • edited Loading

JosefAlbers commented May 27, 2024

Blaizzy commented May 27, 2024

JosefAlbers commented May 28, 2024

Blaizzy commented Jun 3, 2024

lin72h commented Jun 4, 2024

JosefAlbers commented Jun 4, 2024

Blaizzy commented Jun 4, 2024 • edited Loading

JosefAlbers commented Jun 5, 2024

Feature Request: Support for `phi-3-vision-128k-instruct` #28

Feature Request: Support for `phi-3-vision-128k-instruct` #28

Blaizzy commented May 25, 2024 •

edited

Loading

Blaizzy commented Jun 4, 2024 •

edited

Loading