feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

hiro-v · 2023-12-04T01:05:34Z

Problem
I have an existing NVIDIA triton inference server with TensorRT-LLM as backend. I want to use that model within Jan

Success Criteria

Inference engine for nvidia Triton inference server - nvidia-inference-engine-trt-llm/engine.json
Setup script and docs for setting up both and connection
model.json for llama2-7b

Additional context

The text was updated successfully, but these errors were encountered:

hiro-v · 2023-12-04T01:10:43Z

Diagram illustrating Jan integration with NVIDIA Inference cluster

hiro-v · 2023-12-04T01:19:36Z

Setup and benchmark script for NVIDIA Inference cluster with TensorRT backend (model: meta-llama/llama2-7b): https://github.com/hamelsmu/llama-inference/tree/master/triton-tensorRT-quantized-awq-batch

hiro-v added P1: important Important feature / fix type: feature request A new feature mlops labels Dec 4, 2023

hiro-v added this to the v0.4.0 milestone Dec 4, 2023

hiro-v self-assigned this Dec 4, 2023

hiro-v added this to Jan & Cortex Dec 4, 2023

hiro-v moved this to Todo in Jan & Cortex Dec 4, 2023

hiro-v moved this from Todo to In Progress in Jan & Cortex Dec 6, 2023

hiro-v mentioned this issue Dec 7, 2023

feat: Add NVIDIA triton trt-llm extension #888

Merged

hiro-v moved this from In Progress to Todo in Jan & Cortex Dec 11, 2023

dan-homebrew modified the milestones: 0.4.0, 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023

hiro-v moved this from Todo to In Review in Jan & Cortex Dec 12, 2023

hiro-v moved this from In Review to Done in Jan & Cortex Dec 16, 2023

hiro-v closed this as completed Dec 20, 2023

Provide feedback