-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] streamer callback for text-generation task #394
Comments
Hi there 👋 I definitely think the addition of an equivalent The current approach to text streaming (which was actually added before the python library added const pipe = await pipeline(
'text-generation',
model,
{ quantized: true }
)
pipe(prompt, { callback_function: beams => { console.log(beams) }}) Here's an example of streaming + decoding: |
@xenova How to define the callback_function to make the text-generation stop at special words(like the openai api's "stop" param) |
@xenova is this issue still open for contribution? |
@xenova I realized after updating from version |
const pipe = await pipeline('text-generation', 'Xenova/LiteLlama-460M-1T', {
dtype: 'q8',
model_file_name: 'decoder_model_merged'
})
let response = ''
await pipe(
`
### Context: General Relativity and Special Relativity are two main topics in Relative Mechanics.
Einstein Field Equatioqns is mathmetical model for General Relativity
### Question: What is Relative Mechanics?
### Response:`,
{
max_length: 500,
skip_prompt: true,
callback_function: beams => {
const tokens = beams[0].output_token_ids
const decodedText = pipe.tokenizer.decode(tokens.slice(tokens.length - 1, tokens.length), {
skip_special_tokens: true
})
response += decodedText
process.stdout.write(response)
}
}
)
console.info(response) With this code, version 3.0.0, 3.0.1 and 3.0.2 does not stream the tokens and version 2.17.2 only works for this |
The calback_function doesn't get triggered. Looks The quality of code is questionable. Main areas do not work. Already npm install package has issues now this too. |
Hi all 👋 Apologies for not updating the thread. In Transformers.js v3, the non-standard import { pipeline, TextStreamer } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/Qwen2.5-Coder-0.5B-Instruct",
{ dtype: "q4" },
);
// Define the list of messages
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a quick sort algorithm." },
];
// Create text streamer
const streamer = new TextStreamer(generator.tokenizer, {
skip_prompt: true,
callback_function: (text) => console.log(text), // Optional callback function
})
// Generate a response
const output = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer });
console.log(output[0].generated_text.at(-1).content); Let me know if that helps! Also, if someone would be interested in contributing this to the docs and example projects, that would be amazing 🤗 |
Resolved from #1066 by adding streaming documents and enhancing type support. |
Working thanks! |
Streamer
https://huggingface.co/docs/transformers/generation_strategies#streaming
Reason for request
Currently, iterating
max_new_tokens: 1
takes much longer time than single generation. Text generation takes time even for light model. Token streaming is key feature for user experience. In my case. Task-specific text generation could be a key feature of AI app development using transformers.js with low cost.Additional context
I'm not sure the
TextStreamer
class need to be compatibility with python transformers. I wrote an use case proposal withTextStreamer extends TransformStream
. AsyncIterable, AsynsGeneator and Stream API might be usable.Suggesting streaming code
This is vercel's approach
https://github.com/vercel/ai/blob/main/packages/core/streams/ai-stream.ts
https://github.com/vercel-labs/ai-chatbot/blob/main/app/api/chat/route.ts
The text was updated successfully, but these errors were encountered: