v2.6.6

DefTruth released this 25 Nov 03:22

· 2 commits to main since this release

What's Changed

Add code link to BPT by @DefTruth in #95
add vAttention code link by @KevinZeng08 in #96
🔥[SageAttention] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml) by @DefTruth in #97
🔥[SageAttention-2] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml) by @DefTruth in #98
🔥[Squeezed Attention] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@uc Berkeley) by @DefTruth in #99
🔥[SparseInfer] SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference by @DefTruth in #100

New Contributors

@KevinZeng08 made their first contribution in #96

Full Changelog: v2.6.5...v2.6.6

Contributors

uc, thu-ml, and 2 other contributors

Assets 2