Faster / more memory efficient Qwen VL #114

awni · 2024-11-14T18:50:41Z

Remove the specialized RoPE in the language model as it is equivalent to a regular RoPE in that case. Using the MLX fast RoPE is also faster
Avoid accidentally upcasting everything to fp32. This can make a big difference in memory use, especially for lots of images or a videos.

mlx_vlm/models/qwen2_vl/qwen2_vl.py

Blaizzy

LGTM!

Blaizzy · 2024-11-14T23:38:29Z

Thanks a lot @awni!

This brings another significant boost in performance.

I will review other models as soon as I finish Molmo #112

even faster qwen vl

093b66c

Blaizzy reviewed Nov 14, 2024

View reviewed changes

mlx_vlm/models/qwen2_vl/qwen2_vl.py Show resolved Hide resolved

Blaizzy approved these changes Nov 14, 2024

View reviewed changes

Blaizzy merged commit 95f0102 into Blaizzy:main Nov 14, 2024
1 check passed

Provide feedback