You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the initial implementation. The speed-up results look great.
Just wondering though - are there any stats / results that compare with Medusa, since I see that Medusa doesn't need a draft model either, and involves guessing / predicting future tokens at the current step. I understand that Medusa might need finetuning for the heads to predict future tokens, while Lookahead doesn't, but assuming we have finetuned heads (minimal cost considering a frozen quantized base model), does Lookahead provide huge improvements over Medusa?
Especially trying to understand how Lookahead-Decoding fares in terms of speed-up and memory-consumption, compared to Medusa. (I see that since Lookahead is an exact decoding and not an approximation, qualitative-performance will be same as the original base model, and might hence be better than Medusa)
So any info can help :)
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Thanks for the initial implementation. The speed-up results look great.
Just wondering though - are there any stats / results that compare with Medusa, since I see that Medusa doesn't need a draft model either, and involves guessing / predicting future tokens at the current step. I understand that Medusa might need finetuning for the heads to predict future tokens, while Lookahead doesn't, but assuming we have finetuned heads (minimal cost considering a frozen quantized base model), does Lookahead provide huge improvements over Medusa?
Especially trying to understand how Lookahead-Decoding fares in terms of speed-up and memory-consumption, compared to Medusa. (I see that since Lookahead is an exact decoding and not an approximation, qualitative-performance will be same as the original base model, and might hence be better than Medusa)
So any info can help :)
Thanks in advance!
The text was updated successfully, but these errors were encountered: