Triton inference server deployment. #325

cm2435 · 2023-05-28T21:06:34Z

cm2435
May 28, 2023

Hello All,

Firstly, amazing project. It can be a massive hassle converting LLM's to optimised formats and this is an excellent solution for many pain points I face in deploying.

I want to ask; is there any way that the torchdynamo models (which I can see I think the optimise_model fn is yielding?) Can be passed to a triton inference server to optimise inference in production deployment? I know the supported backends include python, tensorRT and PyTorch but I'm not sure if the output artefact plays nicely with any of those forms.

Any advise on how to achieve this would be greatly appreciated.

cm2435 · 2023-05-28T21:07:53Z

cm2435
May 28, 2023
Author

This seems to be a dupe of @311 actually so I am going to close this. apologies

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton inference server deployment. #325

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Triton inference server deployment. #325

cm2435 May 28, 2023

Replies: 1 comment

cm2435 May 28, 2023 Author

cm2435
May 28, 2023

cm2435
May 28, 2023
Author