You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the KV cache implementation only allows generation of sequences up to the maximum context length. However, for generating sequences post this, it's possible to just have a cut off and sequentially generate a new token by "forgetting" (i.e. not inputting) earlier tokens in the sequence (in other words, if you have maximum context length 64, to generate the 65th token, we could "forget" the first token and so on). In every generation after the max content length, you're actually generating the last token in the sequence (re-using that last token position).
This is what the original implementation / fork does for not using KV caching.
This requires either dynamically increasing the size of the cache when we've reached the end of the model's context length, or a clever way to shift the cache positions so that you're pulling the right previous key-values from the cache.
The text was updated successfully, but these errors were encountered:
Currently, the KV cache implementation only allows generation of sequences up to the maximum context length. However, for generating sequences post this, it's possible to just have a cut off and sequentially generate a new token by "forgetting" (i.e. not inputting) earlier tokens in the sequence (in other words, if you have maximum context length 64, to generate the 65th token, we could "forget" the first token and so on). In every generation after the max content length, you're actually generating the last token in the sequence (re-using that last token position).
This is what the original implementation / fork does for not using KV caching.
This requires either dynamically increasing the size of the cache when we've reached the end of the model's context length, or a clever way to shift the cache positions so that you're pulling the right previous key-values from the cache.
The text was updated successfully, but these errors were encountered: