add support for vision-language model endpoints #100

mattf · 2024-09-06T13:01:42Z

add support for vision-language models, those that can accept images and text as input and produce text.

these are akin to https://platform.openai.com/docs/guides/vision with three notable differences -

images can be passed with img tags in the regular text content
images can be passed as NVCF asset ids
not all model endpoints support all features, e.g. server-side download of images not available with adept/fuyu-8b, google/deplot, microsoft/kosmos-2, google/paligemma; some models endpoints restrict image size; some models support one and only image; some models do not support gif or webp; kosmos-2 does not support streaming

prototype support for vlm existed in 0.2.2 and before

breaking changes -

remove client-side image download w/ resizing and base64 encoding

this will be available in v0.3

… from set of vlm models; not available

- removes client side url download & base64 encoding feature, impacts adept/fuyu-8b, google/deplot, microsoft/kosmos-2, google/paligemma

…be default)

mattf added 2 commits August 31, 2024 07:07

remove liuhaotian/llava-v1.6-mistral-7b and liuhaotian/llava-v1.6-34b…

2dfb62c

… from set of vlm models; not available

add support for openai-like vlm models and nvcf asset ids

3e19339

- removes client side url download & base64 encoding feature, impacts adept/fuyu-8b, google/deplot, microsoft/kosmos-2, google/paligemma

mattf requested review from dglogo and raspawar September 6, 2024 13:01

mattf self-assigned this Sep 6, 2024

mattf added 2 commits September 9, 2024 16:17

update nvidia-ai-endpoints notebook vlm examples (add asset id)

4da68fb

add nvidia/vila to set of vlm model (does not pass all tests, cannot …

6ec6b73

…be default)

mattf changed the base branch from main to dev-v0.3 September 17, 2024 10:40

mattf added 2 commits September 19, 2024 10:32

Merge branch 'dev-v0.3' into add-vlm-support

ebfceed

Merge branch 'dev-v0.3' into add-vlm-support

1098fdd

mattf merged commit 1eeb30f into dev-v0.3 Sep 19, 2024
12 checks passed

mattf deleted the mattf/add-vlm-support branch September 19, 2024 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for vision-language model endpoints #100

add support for vision-language model endpoints #100

mattf commented Sep 6, 2024 •

edited

Loading

add support for vision-language model endpoints #100

add support for vision-language model endpoints #100

Conversation

mattf commented Sep 6, 2024 • edited Loading

mattf commented Sep 6, 2024 •

edited

Loading