-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HuggingFaceTGIGenerator
LLM support (2.x)
#5625
Comments
Note that the TGI license was recently changed and it's not OSI approved license: huggingface/text-generation-inference#726. Maybe we should also look at other inference libraries such as https://github.com/vllm-project/vllm who have a more liberal license? (vLLM also has better throughput and latency than TGI: https://www.anyscale.com/blog/continuous-batching-llm-inference) |
Thanks for this reference @mathislucka seems like an awesome project indeed. I think we should add support to both of these backend servings. We need to support TGI on the client as it is a simple REST invocation that works across the entire HF backend ecosystem: from HF inference free-tier, paid subscription, and then finally TGI backend serving. They expose the same REST endpoints on all of these. |
I agree with @vblagoje that we need this functionality next to vLLM. Can we change the name, though to HuggingFaceGenerator or HuggingFaceRemoteGenerator? Also this should not be confused with HuggingFaceLocalGenerator |
HuggingFaceTGIGenerator
HuggingFaceTGIGenerator
LLM support (2.x)
Hi team! I am wondering if the roadmap includes adding HuggingFace support for inference endpoints with a HuggingFace token? This solution usually comes with an API token and a URL, and I can call the LLM hosted on Hugging Face without needing to download. |
See the proposal: #5540
The TGI library provides a convenient interface to query several LLM under a unified interface, so we can make a component out of that to extend our LLM support beyond OpenAI models.
Draft API for
TGIGenerator
:Note how the component takes a list of prompts and LLM parameters only, but no variables nor templates, and returns only strings. This is because input rendering and output parsing are delegated to
PromptBuilder
.In order to support token streaming, we make this component accept a callback in
__init__
, and that callback will be called every time a new chunk of the streamed response is received.The text was updated successfully, but these errors were encountered: