Trying to run VILA on triton with triton_llm backend #133

dand-milestone · 2024-09-03T13:43:45Z

After successfully creating the engine files I want to deploy the VILA model with triton server.

However, it fails because transformers doesn't recognize the model type "llava_llama"

[TensorRT-LLM][WARNING] stats_check_period_ms is not specified, will be set to 100 (ms)
I0903 13:38:59.297664 1309 libtensorrtllm.cc:184] "TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_0_0"
I0903 13:38:59.298296 1309 model_lifecycle.cc:839] "successfully loaded 'tensorrt_llm'"
I0903 13:38:59.443154 1309 pb_stub.cc:366] "Failed to initialize Python stub: ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.\n\nAt:\n  /usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py(1008): from_pretrained\n  /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(856): from_pretrained\n  /opt/tritonserver/multimodal_ifb/postprocessing/1/model.py(81): initialize\n"
[TensorRT-LLM][WARNING] Don't setup 'skip_special_tokens' correctly (set value is ${skip_special_tokens}). Set it as True by default.
....

Any pointer on how to create a working model_repository for the VILA model woul dbe highly appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to run VILA on triton with triton_llm backend #133

Trying to run VILA on triton with triton_llm backend #133

dand-milestone commented Sep 3, 2024

Trying to run VILA on triton with triton_llm backend #133

Trying to run VILA on triton with triton_llm backend #133

Comments

dand-milestone commented Sep 3, 2024