Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks comparing with Medusa #26

Open
Rock-Anderson opened this issue Nov 30, 2023 · 0 comments
Open

Benchmarks comparing with Medusa #26

Rock-Anderson opened this issue Nov 30, 2023 · 0 comments

Comments

@Rock-Anderson
Copy link

Rock-Anderson commented Nov 30, 2023

Thanks for the initial implementation. The speed-up results look great.

Just wondering though - are there any stats / results that compare with Medusa, since I see that Medusa doesn't need a draft model either, and involves guessing / predicting future tokens at the current step. I understand that Medusa might need finetuning for the heads to predict future tokens, while Lookahead doesn't, but assuming we have finetuned heads (minimal cost considering a frozen quantized base model), does Lookahead provide huge improvements over Medusa?

Especially trying to understand how Lookahead-Decoding fares in terms of speed-up and memory-consumption, compared to Medusa. (I see that since Lookahead is an exact decoding and not an approximation, qualitative-performance will be same as the original base model, and might hence be better than Medusa)
So any info can help :)

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant