Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models to port to MLX-VLM #39

Open
12 of 26 tasks
Blaizzy opened this issue Jun 11, 2024 · 25 comments
Open
12 of 26 tasks

Models to port to MLX-VLM #39

Blaizzy opened this issue Jun 11, 2024 · 25 comments
Labels
good first issue Good for newcomers

Comments

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 11, 2024

  • MiniCPM-Llama3-V-2_5
  • Florence 2
  • Phi-3-vision
  • Bunny
  • Dolphi-vision-72b
  • Llava Next
  • Qwen2-VL
  • Pixtral
  • Llama-3.2
  • Llava Interleave
  • Idefics 3
  • OmniParser
  • Llava onevision
  • internlm-xcomposer2d5-7b
  • InternVL
  • CogVLM2
  • Copali
  • MoonDream2
  • Yi-VL
  • CuMo
  • Kosmos-2.5
  • Molmo
  • Ovis Gemma
  • Aria
  • NVIDIA NVLM
  • GOT

Instructions:

  1. Select the model and comment below with your selection
  2. Create a Draft PR titled: "Add support for X"
  3. Read Contribution guide
  4. Check existing models
  5. Tag @Blaizzy for code reviews and questions.

If the model you want is not listed, please suggest it and I will add it.

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jun 22, 2024

Next release of Llava-Next

TODO:
update text config defaults to avoid errors with Llava-v1.6-vicuna:

class TextConfig:
    model_type: str
    hidden_size: int = 4096
    num_hidden_layers: int = 32
    intermediate_size: int = 11008
    num_attention_heads: int = 32
    rms_norm_eps: float = 1e-05
    vocab_size: int = 32064
    num_key_value_heads: int = 32
    rope_theta: float = 1000000
    rope_traditional: bool = False
    rope_scaling: Optional[Dict[str, Union[float, str]]] = None

@BoltzmannEntropy
Copy link

Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2
I am now just reading the code, and trying to free some time for the conversion routine.

@jrp2014
Copy link

jrp2014 commented Aug 8, 2024

@Blaizzy
Copy link
Owner Author

Blaizzy commented Aug 8, 2024

Hey @BoltzmannEntropy and @jrp2014,

Thanks for the suggestions!

I have added them to the backlog

@jrp2014
Copy link

jrp2014 commented Aug 27, 2024

MiniCPM-V v2.6

1 similar comment
@jrp2014
Copy link

jrp2014 commented Aug 27, 2024

MiniCPM-V v2.6

@s-smits
Copy link

s-smits commented Sep 7, 2024

Do you have a link to Florence-2?

@ChristianWeyer
Copy link

Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 10, 2024

Hey @ChristianWeyer
Its mostly up-to-date, just missing qwen2-vl

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 10, 2024

@s-smits here you go:

https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

@ChristianWeyer
Copy link

[x] Phi-3-vision

Thanks!
I guess Phi-3-vision includes 3.5?

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 10, 2024

Yes, they have the same arch so there are no changes needed :)

@pulkitjindal88
Copy link

Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.

@chigkim
Copy link

chigkim commented Sep 21, 2024

Qwen2-VL-72B would be amazing!

@simonw
Copy link

simonw commented Sep 29, 2024

This recipe seems to work for Qwen2-VL-2B-Instruct:

python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 100 \
  --temp 0.0 \
  --image django-roadmap.png \
  --prompt "Describe image in detail, include all text"

My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

@chigkim
Copy link

chigkim commented Sep 30, 2024

Yep they just merged Qwen2-vl support this weekend.

@xSNYPSx
Copy link

xSNYPSx commented Oct 2, 2024

Molmo please

@chigkim
Copy link

chigkim commented Oct 2, 2024

Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.

https://huggingface.co/nvidia/NVLM-D-72B

@Blaizzy
Copy link
Owner Author

Blaizzy commented Oct 2, 2024

Yap, that's a pretty awesome model!
It's on my radar because we can run it in 4bit quant

@chigkim
Copy link

chigkim commented Oct 25, 2024

Pixtral-12B now has Base model.
https://huggingface.co/mistralai/Pixtral-12B-Base-2409

@Benjoyo
Copy link

Benjoyo commented Nov 22, 2024

Hey @Blaizzy, could you add ColQwen support? As there already is qwen2-vl and ColQwen is just an additional linear layer on top this seems to be a low hanging fruit, also considering Col* is a really hot topic right now.

I could really use this for my projects (e.g. local private document search + qa) 😊

@pcuenca
Copy link
Contributor

pcuenca commented Nov 26, 2024

Working on Idefics 3 here: #124

@Blaizzy
Copy link
Owner Author

Blaizzy commented Nov 26, 2024

@Benjoyo, ColQwen and CoPali are awesome models.

At the moment, I'm going working on refactoring and some optimisations. New model ports by me are on hold.

However, I appreaciate any PRs. I'm here to review and help when needed.

@Blaizzy
Copy link
Owner Author

Blaizzy commented Nov 26, 2024

Thanky you very much, @pcuenca!

It means a lot 🚀

I left a few comments.

@kukeshajanth
Copy link

is it possible to bring this under mlx-vlm

https://huggingface.co/showlab/ShowUI-2B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests