We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem I have an existing NVIDIA triton inference server with TensorRT-LLM as backend. I want to use that model within Jan
Success Criteria
nvidia-inference-engine-trt-llm/engine.json
model.json
Additional context
The text was updated successfully, but these errors were encountered:
Diagram illustrating Jan integration with NVIDIA Inference cluster
Sorry, something went wrong.
Setup and benchmark script for NVIDIA Inference cluster with TensorRT backend (model: meta-llama/llama2-7b): https://github.com/hamelsmu/llama-inference/tree/master/triton-tensorRT-quantized-awq-batch
hiro-v
No branches or pull requests
Problem
I have an existing NVIDIA triton inference server with TensorRT-LLM as backend. I want to use that model within Jan
Success Criteria
nvidia-inference-engine-trt-llm/engine.json
model.json
for llama2-7bAdditional context
The text was updated successfully, but these errors were encountered: