Benchmarks comparing with Medusa #26

Rock-Anderson · 2023-11-30T02:36:22Z

Thanks for the initial implementation. The speed-up results look great.

Just wondering though - are there any stats / results that compare with Medusa, since I see that Medusa doesn't need a draft model either, and involves guessing / predicting future tokens at the current step. I understand that Medusa might need finetuning for the heads to predict future tokens, while Lookahead doesn't, but assuming we have finetuned heads (minimal cost considering a frozen quantized base model), does Lookahead provide huge improvements over Medusa?

Especially trying to understand how Lookahead-Decoding fares in terms of speed-up and memory-consumption, compared to Medusa. (I see that since Lookahead is an exact decoding and not an approximation, qualitative-performance will be same as the original base model, and might hence be better than Medusa)
So any info can help :)

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks comparing with Medusa #26

Benchmarks comparing with Medusa #26

Rock-Anderson commented Nov 30, 2023 •

edited

Loading

Benchmarks comparing with Medusa #26

Benchmarks comparing with Medusa #26

Comments

Rock-Anderson commented Nov 30, 2023 • edited Loading

Rock-Anderson commented Nov 30, 2023 •

edited

Loading