Add `HuggingFaceTGIGenerator` LLM support (2.x) #5625

ZanSara · 2023-08-25T08:58:24Z

See the proposal: #5540

The TGI library provides a convenient interface to query several LLM under a unified interface, so we can make a component out of that to extend our LLM support beyond OpenAI models.

Draft API for TGIGenerator:

@component
class TGIGenerator:

    def __init__(self, streaming_callback: Optional[Callable] = None, ... parameters ...):
        ...

    @component.output_types(replies=List[List[str]])
    def run(self, prompts: List[str], ... specific params...):
        ...
        return {'replies': [...], 'metadata': [...]}

Note how the component takes a list of prompts and LLM parameters only, but no variables nor templates, and returns only strings. This is because input rendering and output parsing are delegated to PromptBuilder.

In order to support token streaming, we make this component accept a callback in __init__, and that callback will be called every time a new chunk of the streamed response is received.

The text was updated successfully, but these errors were encountered:

mathislucka · 2023-08-26T16:22:25Z

Note that the TGI license was recently changed and it's not OSI approved license: huggingface/text-generation-inference#726. Maybe we should also look at other inference libraries such as https://github.com/vllm-project/vllm who have a more liberal license? (vLLM also has better throughput and latency than TGI: https://www.anyscale.com/blog/continuous-batching-llm-inference)

vblagoje · 2023-08-31T09:38:47Z

Thanks for this reference @mathislucka seems like an awesome project indeed. I think we should add support to both of these backend servings. We need to support TGI on the client as it is a simple REST invocation that works across the entire HF backend ecosystem: from HF inference free-tier, paid subscription, and then finally TGI backend serving. They expose the same REST endpoints on all of these.

Timoeller · 2023-10-13T14:26:17Z

I agree with @vblagoje that we need this functionality next to vLLM. Can we change the name, though to HuggingFaceGenerator or HuggingFaceRemoteGenerator?

Also this should not be confused with HuggingFaceLocalGenerator

lfunderburk · 2023-10-21T12:25:27Z

Hi team!

I am wondering if the roadmap includes adding HuggingFace support for inference endpoints with a HuggingFace token? This solution usually comes with an API token and a URL, and I can call the LLM hosted on Hugging Face without needing to download.

https://huggingface.co/inference-endpoints

ZanSara mentioned this issue Aug 25, 2023

LLM support (2.x) #5330

Closed

ZanSara changed the title ~~GTIGenerator~~ TGIGenerator Aug 25, 2023

ZanSara added the 2.x Related to Haystack v2.0 label Aug 25, 2023

ZanSara assigned vblagoje Aug 29, 2023

vblagoje changed the title ~~TGIGenerator~~ Add HuggingFaceTGIGenerator Oct 16, 2023

vblagoje changed the title ~~Add HuggingFaceTGIGenerator~~ Add HuggingFaceTGIGenerator LLM support (2.x) Oct 16, 2023

masci closed this as completed Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `HuggingFaceTGIGenerator` LLM support (2.x) #5625

Add `HuggingFaceTGIGenerator` LLM support (2.x) #5625

ZanSara commented Aug 25, 2023 •

edited

Loading

mathislucka commented Aug 26, 2023

vblagoje commented Aug 31, 2023 •

edited

Loading

Timoeller commented Oct 13, 2023

lfunderburk commented Oct 21, 2023

Add HuggingFaceTGIGenerator LLM support (2.x) #5625

Add HuggingFaceTGIGenerator LLM support (2.x) #5625

Comments

ZanSara commented Aug 25, 2023 • edited Loading

mathislucka commented Aug 26, 2023

vblagoje commented Aug 31, 2023 • edited Loading

Timoeller commented Oct 13, 2023

lfunderburk commented Oct 21, 2023

Add `HuggingFaceTGIGenerator` LLM support (2.x) #5625

Add `HuggingFaceTGIGenerator` LLM support (2.x) #5625

ZanSara commented Aug 25, 2023 •

edited

Loading

vblagoje commented Aug 31, 2023 •

edited

Loading