Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for vision-language model endpoints #100

Merged
merged 6 commits into from
Sep 19, 2024

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Sep 6, 2024

add support for vision-language models, those that can accept images and text as input and produce text.

these are akin to https://platform.openai.com/docs/guides/vision with three notable differences -

  1. images can be passed with img tags in the regular text content
  2. images can be passed as NVCF asset ids
  3. not all model endpoints support all features, e.g. server-side download of images not available with adept/fuyu-8b, google/deplot, microsoft/kosmos-2, google/paligemma; some models endpoints restrict image size; some models support one and only image; some models do not support gif or webp; kosmos-2 does not support streaming

prototype support for vlm existed in 0.2.2 and before

breaking changes -

  • remove client-side image download w/ resizing and base64 encoding

this will be available in v0.3

mattf added 2 commits August 31, 2024 07:07
- removes client side url download & base64 encoding feature, impacts adept/fuyu-8b, google/deplot, microsoft/kosmos-2, google/paligemma
@mattf mattf requested review from dglogo and raspawar September 6, 2024 13:01
@mattf mattf self-assigned this Sep 6, 2024
@mattf mattf changed the base branch from main to dev-v0.3 September 17, 2024 10:40
@mattf mattf merged commit 1eeb30f into dev-v0.3 Sep 19, 2024
12 checks passed
@mattf mattf deleted the mattf/add-vlm-support branch September 19, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant