Replies: 1 comment
-
This seems to be a dupe of @311 actually so I am going to close this. apologies |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello All,
Firstly, amazing project. It can be a massive hassle converting LLM's to optimised formats and this is an excellent solution for many pain points I face in deploying.
I want to ask; is there any way that the torchdynamo models (which I can see I think the optimise_model fn is yielding?) Can be passed to a triton inference server to optimise inference in production deployment? I know the supported backends include python, tensorRT and PyTorch but I'm not sure if the output artefact plays nicely with any of those forms.
Any advise on how to achieve this would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions