Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown TensorRT-LLM model endpoint when using --model-namespacing=true #7823

Open
MatteoPagliani opened this issue Nov 21, 2024 · 0 comments

Comments

@MatteoPagliani
Copy link

MatteoPagliani commented Nov 21, 2024

Hi,

I am trying to serve two LLMs concurrently with TensorRT-LLM backend. The folder structure of the two Triton Model Repositories is the following:

triton_models/
├── gemma2/
│   ├── preprocessing/
│   ├── postprocessing/
│   ├── tensorrt_llm/
│   └── tensorrt_llm_bls/
└── llama3/
    ├── preprocessing/
    ├── postprocessing/
    ├── tensorrt_llm/
    └── tensorrt_llm_bls/

I am running the command tritonserver --model-repository=path_to_triton_models/gemma2 --model-repository=path_to_triton_models/llama3 --model-namespacing=true. All the models are loaded correctly as confirmed by the logs.

At this point I want to send a query to a model. In a single-model deployment scenario, I would use the following curl command:

curl -X POST \
    -s localhost:8000/v2/models/tensorrt_llm_bls/generate \
    -d '{
        "text_input": "What is machine learning?",
        "max_tokens": 512,
    }'

However, if I use the same endpoint (localhost:8000/v2/models/tensorrt_llm_bls/generate) in the two-models deployment scenario I get, as expected, the following error:

{"error":"There are 2 identifiers of model 'tensorrt_llm_bls' in global map, model namespace must be provided to resolve ambiguity."}

The problem is that I don't know how should I change the target endpoint with --model-namespacing enabled. I tried many things but none of them worked and it seems there is no documentation about this.

Can you help me out? Thanks in advance. Tagging @rmccorm4 for support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant