v0.1.6
What's Changed
- Pseudo dequantize function by @casper-hansen in #127
- CUDA 11.8.0 and 12.1.1 build by @casper-hansen in #128
- AwqConfig class by @casper-hansen in #132
- Fix init quant by @casper-hansen in #136
- Update readme by @casper-hansen in #137
- Benchmark info by @casper-hansen in #138
- Bump to v0.1.6 by @casper-hansen in #139
- CUDA 12 release by @casper-hansen in #140
- Revert to previous version by @casper-hansen in #141
- Fix performance regression by @casper-hansen in #148
- [
core
/attention
] Fix fused attention generation with newest transformers version by @younesbelkada in #146 - Fix condition when rolling cache by @casper-hansen in #150
- Default to safetensors for quantized models by @casper-hansen in #151
- Create fused LlamaLikeModel by @casper-hansen in #152
Full Changelog: v0.1.5...v0.1.6