-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: docker image openvino/model_server:latest-gpu does not serve the model correctly #27541
Comments
@fedecompa I encountered several issues too when attempting the steps in this guide (which you shared) on Windows: How to serve LLM models with Continuous Batching via OpenAI API. Please note that this demo was officially validated on Intel® Xeon® processors Gen4 and Gen5 and Intel dGPU ARC and Flex models on Ubuntu22/24 and RedHat8/9. Other OS/hardware might work but still, issues are expected. |
@Iffa-Intel thanks for the reply. model_id = "Fede90/llama-3.2-3b-instruct-INT4" So it is actually very strange... |
@fedecompa we'll further investigate & clarify this and get back to you. This probably relates to the architecture of WSL2 in Windows vs Ubuntu which influenced the OpenVINO library functionality. |
@fedecompa I see you listed using 2024.3 version, I've just tried the 2024.5 version of the model server image for GPU and the issue does not reproduce. Would it be possible to try the latest version? Hope this resolves the issue on your side, let us know if you have any questions or issue persists. Note I've tried meta-llama/Meta-Llama-3-8B-Instruct, let me check also with meta-llama/Llama-3.2-3B-Instruct, based on the error it might be caused by a mismatch in the model's name.
|
@fedecompa just checked with
|
OpenVINO Version
2024.3
Operating System
Windows System
Device used for inference
intel UHD Graphics GPU
Framework
None
Model used
meta-llama/Llama-3.2-3B-Instruct
Issue description
I deployed the llama 3.2 -3B model using the image: openvino/model_server:latest-gpu following the documentation here:
https://docs.openvino.ai/2024/openvino-workflow/model-server/ovms_demos_continuous_batching.html
and the folder structure for the openvino IR model:
https://github.com/openvinotoolkit/model_server/blob/main/docs/models_repository.md
The command in my docker-compose is:
command: --model_path /workspace/Llama-3.2-3B-Instruct --model_name meta-llama/Llama-3.2-3B-Instruct --port 9001 --rest_port 8001 --target_device GPU
From the logs in the container I see that the server loads the model and starts correctly. Indeed if I call the API http://localhost:8001/v1/config I obtain:
{
"meta-llama/Llama-3.2-3B-Instruct" :
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": "OK"
}
}
]
}
}
However when I call the completions endpoint I get 404: {
"error": "Model with requested name is not found"
}
Step-by-step reproduction
No response
Relevant log output
No response
Issue submission checklist
The text was updated successfully, but these errors were encountered: