v2.6.6
What's Changed
- Add code link to BPT by @DefTruth in #95
- add vAttention code link by @KevinZeng08 in #96
- 🔥[SageAttention] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml) by @DefTruth in #97
- 🔥[SageAttention-2] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml) by @DefTruth in #98
- 🔥[Squeezed Attention] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@uc Berkeley) by @DefTruth in #99
- 🔥[SparseInfer] SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference by @DefTruth in #100
New Contributors
- @KevinZeng08 made their first contribution in #96
Full Changelog: v2.6.5...v2.6.6