v0.2.0
What's Changed
- AWQ: Separate the AWQ kernels to separate repository by @casper-hansen in #279
- Add CPU-loaded multi-GPU quantization by @xNul in #289
- GGUF compatible quantization (2, 3, 4 bit / any bit) by @casper-hansen in #285
- Exllama kernels support by @IlyasMoutawwakil in #313
- Cleanup requirements by @casper-hansen in #295
- Torch only inference + any-device quantization by @casper-hansen in #319
- Up to 60% faster context processing by @casper-hansen in #316
- Evaluation: Add more evals by @casper-hansen in #283
- Fixes a breaking change in autoawq by @younesbelkada in #325
- AMD ROCM Support by @IlyasMoutawwakil in #315
- Marlin symmetric quantization and inference by @IlyasMoutawwakil in #320
- Add qwen2 by @JustinLin610 in #321
- Fix n_samples by @casper-hansen in #326
- PEFT compatible GEMM by @casper-hansen in #324
- [
PEFT
] Fix PEFT batch size > 1 by @younesbelkada in #338 - v0.2.0 by @casper-hansen in #330
- Fix ROCm build by @casper-hansen in #342
- Fix dependency by @casper-hansen in #343
- Fix importlib by @casper-hansen in #344
- Fix workflow by @casper-hansen in #345
- Fix typo in setup.py by @casper-hansen in #346
New Contributors
- @xNul made their first contribution in #289
- @IlyasMoutawwakil made their first contribution in #313
- @JustinLin610 made their first contribution in #321
Full Changelog: v0.1.8...v0.2.0