You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that the Page Attention might hinder the Streaming LLM since we can not evict some slots within a block. So I want to know how swiftinfer integrate with PA ?
Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that the Page Attention might hinder the Streaming LLM since we can not evict some slots within a block. So I want to know how swiftinfer integrate with PA ?
ref: [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention
The text was updated successfully, but these errors were encountered: