Support for Video with Qwen2-VL #75

tmoroney · 2024-10-02T19:04:08Z

It would be really great if support for Video could be added to Qwen2-VL as it seems to only support images at the moment. I am working on a project that would seriously benefit from it.

Blaizzy · 2024-10-02T23:34:14Z

Hey @tmoroney

Indeed, it's a cool feature of Qwen2-VL. I will work on adding it after #41 and Molmo port.

Could you share a little bit more about your project?

Blaizzy · 2024-10-03T00:48:11Z

I got multi-image support working which is pretty close to video.

From here to video it’s pretty close but I want to make the API seamless for all models, that’s why it will take a bit longer.

https://x.com/Prince_Canuma/status/1841634911825858978

tmoroney · 2024-10-03T01:19:28Z

Hey @tmoroney

Indeed, it's a cool feature of Qwen2-VL. I will work on adding it after #41 and Molmo port.

Could you share a little bit more about your project?

Essentially the project is Copilot for video editors. As part of my masters in computer science I am researching how to create an AI video editing assistant that will suggest the next shot as you edit based on the context of the story so far and the emotional tone etc, along with providing inspiration to remove creative blockers. I want to analyse all of the project's footage on device using small and efficient ML models, along with algorithms (Computer vision, sentiment analyses etc) whenever possible in order to reduce compute.

tmoroney · 2024-10-03T01:20:01Z

I got multi-image support working which is pretty close to video.

From here to video it’s pretty close but I want to make the API seamless for all models, that’s why it will take a bit longer.

https://x.com/Prince_Canuma/status/1841634911825858978

Amazing! Thanks for all your hard work :)

anishjain123 · 2024-11-21T21:13:06Z

@Blaizzy any update on the video support? Love the work you've been doing!

Blaizzy · 2024-11-22T00:06:50Z

Thanks guys!

There is a PR for video support #97 it works but needs a bit of polishing

I will do that and merge it over the weekend.

anishjain123 · 2024-11-27T15:24:22Z

you're a g @Blaizzy , its insane how much the memory usage gets on this since it parses it as an array of images. Im wondering if this is the right architecture for on device video processing

Blaizzy · 2024-11-27T15:54:40Z

Could you elaborate?

andimarafioti · 2024-11-28T12:29:40Z

SmolVLM should really help with that!

This was referenced Oct 16, 2024

Video Input #88

Closed

Add Video Chat Support (beta) #97

Open

Blaizzy mentioned this issue Nov 27, 2024

SmolVLM-Instruct-4bit does not work in gradio UI #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Video with Qwen2-VL #75

Support for Video with Qwen2-VL #75

tmoroney commented Oct 2, 2024

Blaizzy commented Oct 2, 2024

Blaizzy commented Oct 3, 2024

tmoroney commented Oct 3, 2024 •

edited

Loading

tmoroney commented Oct 3, 2024

anishjain123 commented Nov 21, 2024

Blaizzy commented Nov 22, 2024

anishjain123 commented Nov 27, 2024

Blaizzy commented Nov 27, 2024

andimarafioti commented Nov 28, 2024

Support for Video with Qwen2-VL #75

Support for Video with Qwen2-VL #75

Comments

tmoroney commented Oct 2, 2024

Blaizzy commented Oct 2, 2024

Blaizzy commented Oct 3, 2024

tmoroney commented Oct 3, 2024 • edited Loading

tmoroney commented Oct 3, 2024

anishjain123 commented Nov 21, 2024

Blaizzy commented Nov 22, 2024

anishjain123 commented Nov 27, 2024

Blaizzy commented Nov 27, 2024

andimarafioti commented Nov 28, 2024

tmoroney commented Oct 3, 2024 •

edited

Loading