Does SwiftInfer integrate well with Page Attention ? #9

gawainx · 2024-07-29T01:30:44Z

Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that the Page Attention might hinder the Streaming LLM since we can not evict some slots within a block. So I want to know how swiftinfer integrate with PA ?

ref: [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does SwiftInfer integrate well with Page Attention ? #9

Does SwiftInfer integrate well with Page Attention ? #9

gawainx commented Jul 29, 2024

Does SwiftInfer integrate well with Page Attention ? #9

Does SwiftInfer integrate well with Page Attention ? #9

Comments

gawainx commented Jul 29, 2024