Releases · casper-hansen/AutoAWQ

23 Dec 16:46

github-actions

v0.1.8

3f10cf1

v0.1.8

What's Changed

Fix MPT by @casper-hansen in #206
Add config to Base model by @casper-hansen in #207
Add Qwen model by @Sanster in #182
Robust quantization for Catcher by @casper-hansen in #209
New scaling to improve perplexity by @casper-hansen in #216
Benchmark hf generate by @casper-hansen in #237
Fix position ids by @casper-hansen in #215
Pass model_init_kwargs to check_and_get_model_type function by @rycont in #232
Fixed an issue where the Qwen model had too much error after quantization by @jundolc in #243
Load on CPU to avoid OOM by @casper-hansen in #236
Update README.md by @casper-hansen in #245
[core] Make AutoAWQ fused modules compatible with HF transformers by @younesbelkada in #244
[core] Fix quantization issues with transformers==4.36.0 by @younesbelkada in #249
FEAT: Add possibility of skipping modules when quantizing by @younesbelkada in #248
Fix quantization issue with transformers >= 4.36.0 by @younesbelkada in #264
Mixtral: Mixture of Experts quantization by @casper-hansen in #251
Fused rope theta by @casper-hansen in #270
FEAT: add llava to autoawq by @younesbelkada in #250
Add Baichuan2 Support by @AoyuQC in #247
Set default rope_theta on LlamaLikeBlock by @casper-hansen in #271
Update news and models supported by @casper-hansen in #272
Add vLLM async example by @casper-hansen in #273
Bump to v0.1.8 by @casper-hansen in #274

New Contributors

@Sanster made their first contribution in #182
@rycont made their first contribution in #232
@jundolc made their first contribution in #243
@AoyuQC made their first contribution in #247

Full Changelog: v0.1.7...v0.1.8

Contributors

Sanster, AoyuQC, and 4 other contributors

Assets 18

16 Nov 19:04

github-actions

v0.1.7

87350fe

v0.1.7

What's Changed

Build older cuda wheels by @casper-hansen in #158
Exclude download of CUDA wheels by @casper-hansen in #159
New benchmarks in README by @casper-hansen in #160
Fix typo in benchmark command by @casper-hansen in #161
Yi support by @casper-hansen in #167
Make sure to delete dummy model by @casper-hansen in #180
Fix CUDA error: invalid argument by @casper-hansen in #179
New logic for passing past_key_value by @younesbelkada in #177
Reset cache on new generation by @casper-hansen in #178
Adaptive batch sizing by @casper-hansen in #181
Pass arguments to AutoConfig by @s4rduk4r in #97
Fix cache util logic by @casper-hansen in #186
Fix multi-GPU loading and inference by @casper-hansen in #190
[core] Replace QuantLlamaMLP with QuantFusedMLP by @younesbelkada in #188
[core] Add is_hf_transformers flag by @younesbelkada in #195
Fixed multi-GPU quantization by @casper-hansen in #196

Full Changelog: v0.1.6...v0.1.7

Contributors

s4rduk4r, casper-hansen, and younesbelkada

Assets 18

04 Nov 12:46

github-actions

v0.1.6

8110e02

v0.1.6

What's Changed

Pseudo dequantize function by @casper-hansen in #127
CUDA 11.8.0 and 12.1.1 build by @casper-hansen in #128
AwqConfig class by @casper-hansen in #132
Fix init quant by @casper-hansen in #136
Update readme by @casper-hansen in #137
Benchmark info by @casper-hansen in #138
Bump to v0.1.6 by @casper-hansen in #139
CUDA 12 release by @casper-hansen in #140
Revert to previous version by @casper-hansen in #141
Fix performance regression by @casper-hansen in #148
[core / attention] Fix fused attention generation with newest transformers version by @younesbelkada in #146
Fix condition when rolling cache by @casper-hansen in #150
Default to safetensors for quantized models by @casper-hansen in #151
Create fused LlamaLikeModel by @casper-hansen in #152

Full Changelog: v0.1.5...v0.1.6

Contributors

casper-hansen and younesbelkada

Assets 18

28 Oct 16:41

github-actions

v0.1.5

b9ed664

v0.1.5

What's Changed

Only apply attention mask if seqlen is greater than 1 by @casper-hansen in #96
add gpt_neox support by @twaka in #113
[core] Support fp32 / bf16 inference by @younesbelkada in #121
Fix potential overflow by @casper-hansen in #102
Fixing starcoder based models with 15B by @SebastianBodza in #118
Support Aquila models. by @ftgreat in #123
Add benchmark of Aquila2 34B AWQ in README.md. by @ftgreat in #126

New Contributors

@twaka made their first contribution in #113
@younesbelkada made their first contribution in #121
@SebastianBodza made their first contribution in #118
@ftgreat made their first contribution in #123

Full Changelog: v0.1.4...v0.1.5

Contributors

twaka, ftgreat, and 3 other contributors

Assets 10

06 Oct 16:05

github-actions

v0.1.4

0baf5e1

v0.1.4

What's Changed

Refactor cache and embedding modules by @casper-hansen in #95
Fix TypeError: 'NoneType' object is not subscriptable

Full Changelog: v0.1.3...v0.1.4

Contributors

casper-hansen

Assets 10

05 Oct 18:05

github-actions

v0.1.3

a6ecd0d

v0.1.3

What's Changed

Turing inference support (Colab+Kaggle working) by @casper-hansen in #92
Fix memory bug (save 2GB VRAM)

Full Changelog: v0.1.2...v0.1.3

Contributors

casper-hansen

Assets 10

02 Oct 18:27

github-actions

v0.1.2

eccb8f9

v0.1.2

What's Changed

Fix unexpected keyword by @casper-hansen in #88
Fix Falcon n_kv_heads parameter by @casper-hansen in #89
Mistral fused modules by @casper-hansen in #90

Full Changelog: v0.1.1...v0.1.2

Contributors

casper-hansen

Assets 10

01 Oct 09:04

github-actions

v0.1.1

3fa7400

v0.1.1

What's Changed

Add GPT BigCode support (StarCoder) by @casper-hansen in #61
Use typing classes over base types by @VikParuchuri in #69
Fix KV cache shapes error by @casper-hansen in #75
Mistral support by @casper-hansen in #79
Add low_cpu_mem_usage=True in example by @casper-hansen in #80
Offloading to cpu and disk by @s4rduk4r in #77
Faster build, fix "no space left". by @casper-hansen in #84

New Contributors

@VikParuchuri made their first contribution in #69
@s4rduk4r made their first contribution in #77

Full Changelog: v0.1.0...v0.1.1

Contributors

VikParuchuri, s4rduk4r, and casper-hansen

Assets 10

21 Sep 11:51

github-actions

v0.1.0

133dd7a

v0.1.0

What's Changed

Support Falcon 180B by @casper-hansen in #35
[NEW] GEMV kernel implementation by @casper-hansen in #40
Allow user to use custom calibration data for quantization by @boehm-e in #27
Safetensors and model sharding by @casper-hansen in #47
2x faster context processing with GEMV by @casper-hansen in #58
Support kv_heads by @casper-hansen in #60
Refactor quantization code by @casper-hansen in #62
support windows by @qwopqwop200 in #53
Improve model loading by @casper-hansen in #66

New Contributors

@boehm-e made their first contribution in #27

Full Changelog: v0.0.2...v0.1.0

Contributors

boehm-e, casper-hansen, and qwopqwop200

Assets 10

06 Sep 20:28

github-actions

v0.0.2

9304af9

v0.0.2

What's Changed

Refactor fused modules by @casper-hansen in #18
fuse_layers bug fix by @qwopqwop200 in #21
support speedtest to benchmark FP16 model by @wanzhenchn in #25
Implement batch size for speed test by @casper-hansen in #26
[BUG] Fix illegal memory access + Quantized Multi-GPU support by @casper-hansen in #28
YaRN support for LLaMa models by @casper-hansen in #23

New Contributors

@wanzhenchn made their first contribution in #25

Full Changelog: v0.0.1...v0.0.2

Contributors

wanzhenchn, casper-hansen, and qwopqwop200

Assets 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: casper-hansen/AutoAWQ

v0.1.8

What's Changed

New Contributors

Contributors

v0.1.7

What's Changed

Contributors

v0.1.6

What's Changed

Contributors

v0.1.5

What's Changed

New Contributors

Contributors

v0.1.4

What's Changed

Contributors

v0.1.3

What's Changed

Contributors

v0.1.2

What's Changed

Contributors

v0.1.1

What's Changed

New Contributors

Contributors

v0.1.0

What's Changed

New Contributors

Contributors

v0.0.2

What's Changed

New Contributors

Contributors