Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] streamer callback for text-generation task #394

Closed
seonglae opened this issue Nov 16, 2023 · 9 comments
Closed

[Feature request] streamer callback for text-generation task #394

seonglae opened this issue Nov 16, 2023 · 9 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@seonglae
Copy link
Contributor

Streamer
https://huggingface.co/docs/transformers/generation_strategies#streaming

Reason for request
Currently, iterating max_new_tokens: 1 takes much longer time than single generation. Text generation takes time even for light model. Token streaming is key feature for user experience. In my case. Task-specific text generation could be a key feature of AI app development using transformers.js with low cost.

Additional context
I'm not sure the TextStreamer class need to be compatibility with python transformers. I wrote an use case proposal with TextStreamer extends TransformStream. AsyncIterable, AsynsGeneator and Stream API might be usable.

Suggesting streaming code

let aggregatedResponse = '';
const streamer = TextStreamer()
const pipe: QuestionAnsweringPipeline = await pipeline(
  'text-generation',
  model,
  { quantized: true }
)
pipe(prompt, { streamer: streamer })
res = new Response(streamer.readable)

This is vercel's approach
https://github.com/vercel/ai/blob/main/packages/core/streams/ai-stream.ts
https://github.com/vercel-labs/ai-chatbot/blob/main/app/api/chat/route.ts

@seonglae seonglae added the enhancement New feature or request label Nov 16, 2023
@xenova
Copy link
Collaborator

xenova commented Nov 16, 2023

Hi there 👋 I definitely think the addition of an equivalent TextStreamer class to the library will be great! If someone in the community would like to contribute this, it should be as simple as rewriting this file in JavaScript.

The current approach to text streaming (which was actually added before the python library added TextStreamer) is to add a callback_function to the generate/pipeline function. For example:

const pipe = await pipeline(
  'text-generation',
  model,
  { quantized: true }
)
pipe(prompt, { callback_function: beams => { console.log(beams) }})

Here's an example of streaming + decoding:

https://github.com/xenova/transformers.js/blob/4e4148cb5ce7f4a9265f58b4eeb660c64bed0386/examples/demo-site/src/worker.js#L189-L202

@xenova xenova added help wanted Extra attention is needed good first issue Good for newcomers labels Nov 16, 2023
@wujohns
Copy link

wujohns commented May 14, 2024

@xenova How to define the callback_function to make the text-generation stop at special words(like the openai api's "stop" param)
I also find the code from transformer.js you shows, but I am confused about how to do this next

@02shanks
Copy link

@xenova is this issue still open for contribution?

@seonglae
Copy link
Contributor Author

seonglae commented Nov 19, 2024

@xenova I realized after updating from version @xenova/transformers: ^2.17.2 to @huggingface/transformers ^3.0.2, the callback_function option does not work.

@seonglae
Copy link
Contributor Author

seonglae commented Nov 19, 2024

const pipe = await pipeline('text-generation', 'Xenova/LiteLlama-460M-1T', {
      dtype: 'q8',
      model_file_name: 'decoder_model_merged'
    })
    let response = ''
    await pipe(
      `
### Context: General Relativity and Special Relativity are two main topics in Relative Mechanics.
Einstein Field Equatioqns is mathmetical model for General Relativity
### Question: What is Relative Mechanics?
### Response:`,
      {
        max_length: 500,
        skip_prompt: true,
        callback_function: beams => {
          const tokens = beams[0].output_token_ids
          const decodedText = pipe.tokenizer.decode(tokens.slice(tokens.length - 1, tokens.length), {
            skip_special_tokens: true
          })
          response += decodedText
          process.stdout.write(response)
        }
      }
    )
    console.info(response)

With this code, version 3.0.0, 3.0.1 and 3.0.2 does not stream the tokens and version 2.17.2 only works for this callback_function implementation properly

@djaffer
Copy link

djaffer commented Dec 2, 2024

The calback_function doesn't get triggered.

Looks The quality of code is questionable. Main areas do not work. Already npm install package has issues now this too.

@xenova
Copy link
Collaborator

xenova commented Dec 2, 2024

Hi all 👋 Apologies for not updating the thread. In Transformers.js v3, the non-standard callback_function was removed in favour of the TextStreamer class, which can be used as follows:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/Qwen2.5-Coder-0.5B-Instruct",
  { dtype: "q4" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content:  "Write a quick sort algorithm." },
];

// Create text streamer
const streamer = new TextStreamer(generator.tokenizer, {
  skip_prompt: true,
  callback_function: (text) => console.log(text), // Optional callback function
})

// Generate a response
const output = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer });
console.log(output[0].generated_text.at(-1).content);

Let me know if that helps! Also, if someone would be interested in contributing this to the docs and example projects, that would be amazing 🤗

@seonglae
Copy link
Contributor Author

seonglae commented Dec 3, 2024

Resolved from #1066 by adding streaming documents and enhancing type support.

@seonglae seonglae closed this as completed Dec 3, 2024
@djaffer
Copy link

djaffer commented Dec 3, 2024

Working thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants