Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HuggingFaceTGIGenerator LLM support (2.x) #5625

Closed
Tracked by #5330
ZanSara opened this issue Aug 25, 2023 · 4 comments
Closed
Tracked by #5330

Add HuggingFaceTGIGenerator LLM support (2.x) #5625

ZanSara opened this issue Aug 25, 2023 · 4 comments
Assignees
Labels
2.x Related to Haystack v2.0

Comments

@ZanSara
Copy link
Contributor

ZanSara commented Aug 25, 2023

See the proposal: #5540


The TGI library provides a convenient interface to query several LLM under a unified interface, so we can make a component out of that to extend our LLM support beyond OpenAI models.

Draft API for TGIGenerator:

@component
class TGIGenerator:

    def __init__(self, streaming_callback: Optional[Callable] = None, ... parameters ...):
        ...

    @component.output_types(replies=List[List[str]])
    def run(self, prompts: List[str], ... specific params...):
        ...
        return {'replies': [...], 'metadata': [...]}

Note how the component takes a list of prompts and LLM parameters only, but no variables nor templates, and returns only strings. This is because input rendering and output parsing are delegated to PromptBuilder.

In order to support token streaming, we make this component accept a callback in __init__, and that callback will be called every time a new chunk of the streamed response is received.

@ZanSara ZanSara changed the title GTIGenerator TGIGenerator Aug 25, 2023
@ZanSara ZanSara added the 2.x Related to Haystack v2.0 label Aug 25, 2023
@mathislucka
Copy link
Member

Note that the TGI license was recently changed and it's not OSI approved license: huggingface/text-generation-inference#726. Maybe we should also look at other inference libraries such as https://github.com/vllm-project/vllm who have a more liberal license? (vLLM also has better throughput and latency than TGI: https://www.anyscale.com/blog/continuous-batching-llm-inference)

@vblagoje
Copy link
Member

vblagoje commented Aug 31, 2023

Thanks for this reference @mathislucka seems like an awesome project indeed. I think we should add support to both of these backend servings. We need to support TGI on the client as it is a simple REST invocation that works across the entire HF backend ecosystem: from HF inference free-tier, paid subscription, and then finally TGI backend serving. They expose the same REST endpoints on all of these.

@Timoeller
Copy link
Contributor

I agree with @vblagoje that we need this functionality next to vLLM. Can we change the name, though to HuggingFaceGenerator or HuggingFaceRemoteGenerator?

Also this should not be confused with HuggingFaceLocalGenerator

@vblagoje vblagoje changed the title TGIGenerator Add HuggingFaceTGIGenerator Oct 16, 2023
@vblagoje vblagoje changed the title Add HuggingFaceTGIGenerator Add HuggingFaceTGIGenerator LLM support (2.x) Oct 16, 2023
@lfunderburk
Copy link

Hi team!

I am wondering if the roadmap includes adding HuggingFace support for inference endpoints with a HuggingFace token? This solution usually comes with an API token and a URL, and I can call the LLM hosted on Hugging Face without needing to download.

https://huggingface.co/inference-endpoints

@masci masci closed this as completed Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0
Projects
None yet
Development

No branches or pull requests

6 participants