-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Video with Qwen2-VL #75
Comments
I got multi-image support working which is pretty close to video. From here to video it’s pretty close but I want to make the API seamless for all models, that’s why it will take a bit longer. |
Essentially the project is Copilot for video editors. As part of my masters in computer science I am researching how to create an AI video editing assistant that will suggest the next shot as you edit based on the context of the story so far and the emotional tone etc, along with providing inspiration to remove creative blockers. I want to analyse all of the project's footage on device using small and efficient ML models, along with algorithms (Computer vision, sentiment analyses etc) whenever possible in order to reduce compute. |
Amazing! Thanks for all your hard work :) |
@Blaizzy any update on the video support? Love the work you've been doing! |
Thanks guys! There is a PR for video support #97 it works but needs a bit of polishing I will do that and merge it over the weekend. |
you're a g @Blaizzy , its insane how much the memory usage gets on this since it parses it as an array of images. Im wondering if this is the right architecture for on device video processing |
Could you elaborate? |
SmolVLM should really help with that! |
It would be really great if support for Video could be added to Qwen2-VL as it seems to only support images at the moment. I am working on a project that would seriously benefit from it.
The text was updated successfully, but these errors were encountered: