Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

Closed
hiro-v opened this issue Dec 4, 2023 · 2 comments
Closed
Assignees
Labels
P1: important Important feature / fix type: feature request A new feature

Comments

@hiro-v
Copy link
Contributor

hiro-v commented Dec 4, 2023

Problem
I have an existing NVIDIA triton inference server with TensorRT-LLM as backend. I want to use that model within Jan

Success Criteria

  • Inference engine for nvidia Triton inference server - nvidia-inference-engine-trt-llm/engine.json
  • Setup script and docs for setting up both and connection
  • model.json for llama2-7b

Additional context

@hiro-v hiro-v added P1: important Important feature / fix type: feature request A new feature mlops labels Dec 4, 2023
@hiro-v hiro-v added this to the v0.4.0 milestone Dec 4, 2023
@hiro-v hiro-v self-assigned this Dec 4, 2023
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 4, 2023

Diagram illustrating Jan integration with NVIDIA Inference cluster
Image

@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 4, 2023

Setup and benchmark script for NVIDIA Inference cluster with TensorRT backend (model: meta-llama/llama2-7b): https://github.com/hamelsmu/llama-inference/tree/master/triton-tensorRT-quantized-awq-batch

@hiro-v hiro-v moved this to Todo in Jan & Cortex Dec 4, 2023
@hiro-v hiro-v moved this from Todo to In Progress in Jan & Cortex Dec 6, 2023
@hiro-v hiro-v moved this from In Progress to Todo in Jan & Cortex Dec 11, 2023
@dan-homebrew dan-homebrew modified the milestones: 0.4.0, 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023
@hiro-v hiro-v moved this from Todo to In Review in Jan & Cortex Dec 12, 2023
@hiro-v hiro-v moved this from In Review to Done in Jan & Cortex Dec 16, 2023
@hiro-v hiro-v closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix type: feature request A new feature
Projects
Archived in project
Development

No branches or pull requests

2 participants