v0.2.0

github-actions released this 15 Feb 20:57

· 87 commits to main since this release

What's Changed

AWQ: Separate the AWQ kernels to separate repository by @casper-hansen in #279
Add CPU-loaded multi-GPU quantization by @xNul in #289
GGUF compatible quantization (2, 3, 4 bit / any bit) by @casper-hansen in #285
Exllama kernels support by @IlyasMoutawwakil in #313
Cleanup requirements by @casper-hansen in #295
Torch only inference + any-device quantization by @casper-hansen in #319
Up to 60% faster context processing by @casper-hansen in #316
Evaluation: Add more evals by @casper-hansen in #283
Fixes a breaking change in autoawq by @younesbelkada in #325
AMD ROCM Support by @IlyasMoutawwakil in #315
Marlin symmetric quantization and inference by @IlyasMoutawwakil in #320
Add qwen2 by @JustinLin610 in #321
Fix n_samples by @casper-hansen in #326
PEFT compatible GEMM by @casper-hansen in #324
[PEFT] Fix PEFT batch size > 1 by @younesbelkada in #338
v0.2.0 by @casper-hansen in #330
Fix ROCm build by @casper-hansen in #342
Fix dependency by @casper-hansen in #343
Fix importlib by @casper-hansen in #344
Fix workflow by @casper-hansen in #345
Fix typo in setup.py by @casper-hansen in #346

New Contributors

@xNul made their first contribution in #289
@IlyasMoutawwakil made their first contribution in #313
@JustinLin610 made their first contribution in #321

Full Changelog: v0.1.8...v0.2.0

Contributors

xNul, casper-hansen, and 3 other contributors

Assets 26