Releases · vllm-project/llm-compressor

13 Nov 05:22

dhuangnm

0.3.0

93832a6

v0.3.0 Latest

Latest

What's New in v0.3.0

Key Features and Improvements

GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

Fix Tied Tensors Bug (#659)
Observer Initialization in GPTQ Wrapper (#883)
Sparsity Reload Testing (#882)

Documentation

Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

Fix compresed typo by @kylesayrs in #188
GPTQ Quantized-weight Sequential Updating by @kylesayrs in #177
Add: targets and ignore inference for sparse compression by @rahul-tuli in #191
switch tests from weekly to nightly by @dhuangnm in #658
Compression wrapper abstract methods by @kylesayrs in #170
Explicitly set sequential_update in examples by @kylesayrs in #187
Increase Sparsity Threshold for compressors by @rahul-tuli in #679
Add a generic wrap_hf_model_class utility to support VLMs by @mgoin in #185
Add tests for examples by @dbarbuzzi in #149
Rename to quantization config by @kylesayrs in #730
Implement Missing Modifier Methods by @kylesayrs in #166
Fix 2/4 GPTQ Model Tests by @dsikka in #769
SmoothQuant mappings tutorial by @rahul-tuli in #115
Fix import of ModelCompressor by @rahul-tuli in #776
update test by @dsikka in #773
[Bugfix] Fix saving offloaded state dict by @kylesayrs in #172
Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture by @rahul-tuli in #119
Update workflows/actions by @dbarbuzzi in #774
[Bugfix] Prepare KD Models when Saving by @kylesayrs in #174
Set Sparse compression to save_compressed by @rahul-tuli in #821
Install compressed-tensors after llm-compressor by @dbarbuzzi in #825
Fix test typo by @kylesayrs in #828
Add AutoModelForCausalLM example by @dsikka in #698
[Bugfix] Workaround tied tensors bug by @kylesayrs in #659
Only untie word embeddings by @kylesayrs in #839
Check for config hidden size by @kylesayrs in #840
Use float32 for Hessian dtype by @kylesayrs in #847
GPTQ: Depreciate non-sequential update option by @kylesayrs in #762
Typehint nits by @kylesayrs in #826
[ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in #849
Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in #80
Fix forward function pass call by @dsikka in #845
[Bugfix] Use weight parameter of linear layer by @kylesayrs in #836
[Bugfix] Rename files to remove colons by @kylesayrs in #846
cover all 3.9-3.12 in commit testing by @dhuangnm in #864
Add marlin-24 recipe/configs for e2e testing by @dsikka in #866
[Bugfix] onload during sparsity calculation by @kylesayrs in #862
Fix HFTrainer overloads by @kylesayrs in #869
Support Model Offloading Tied Tensors Patch by @kylesayrs in #872
Add advice about dealing with non-invertable hessians by @kylesayrs in #875
seed commit workflow by @andy-neuma in #877
[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier by @dsikka in #837
Bugfix observer initialization in gptq_wrapper by @rahul-tuli in #883
BugFix: Fix Sparsity Reload Testing by @dsikka in #882
Use custom unique test names for e2e tests by @dbarbuzzi in #892
Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in #893
Move config["testconfig_path"] assignment by @dbarbuzzi in #895
Cap accelerate version to avoid bug by @kylesayrs in #897
Fix observing offloaded weight by @kylesayrs in #896
Update image in README.md by @mgoin in #861
update accelerate version by @kylesayrs in #899
[GPTQ] Iterative Parameter Updating by @kylesayrs in #863
Small fixes for release by @dsikka in #901
use smaller portion of dataset by @dsikka in #902
Update example to not fail hessian inversion by @dsikka in #904
Bump version to 0.3.0 by @dsikka in #907

New Contributors

@miaojinc made their first contribution in #849
@yzlnew made their first contribution in #80
@andy-neuma made their first contribution in #877

Full Changelog: 0.2.0...0.3.0

Contributors

dbarbuzzi, mgoin, and 7 other contributors

Assets 4

23 Sep 22:24

dhuangnm

0.2.0

2e0035f

v0.2.0

What's Changed

Correct Typo in SparseAutoModelForCausalLM docstring by @kylesayrs in #56
Disable Default Bitmask Compression by @Satrat in #60
TRL Example fix by @rahul-tuli in #59
Fix typo by @rahul-tuli in #63
Correct typo by @kylesayrs in #61
correct import in README.md by @zzc0430 in #66
Fix for issue #43 -- starcoder model by @horheynm in #71
Update README.md by @robertgshaw2-neuralmagic in #74
Layer by Layer Sequential GPTQ Updates by @Satrat in #47
[ Docs ] Update main readme by @robertgshaw2-neuralmagic in #77
[ Docs ] gemma2 examples by @robertgshaw2-neuralmagic in #78
[ Docs ] Update FP8 example to use dynamic per token by @robertgshaw2-neuralmagic in #75
[ Docs ] Overhaul accelerate user guide by @robertgshaw2-neuralmagic in #76
Support kv_cache_scheme for quantizing KV Cache by @mgoin in #88
Propagate trust_remote_code Argument by @kylesayrs in #90
Fix for issue #81 by @horheynm in #84
Fix for issue 83 by @horheynm in #85
[ DOC ] Big Model Example by @robertgshaw2-neuralmagic in #99
Enable obcq/finetune integration tests with commit cadence by @dsikka in #101
metric logging on GPTQ path by @horheynm in #65
Update test config files by @dsikka in #97
remove workflows + update runners by @dsikka in #103
metrics by @horheynm in #104
add debug by @horheynm in #108
Add FP8 KV Cache quant example by @mgoin in #113
Add vLLM e2e tests by @dsikka in #117
Fix style, fix noqa by @kylesayrs in #123
GPTQ Algorithm Cleanup by @kylesayrs in #120
GPTQ Activation Ordering by @kylesayrs in #94
demote recipe string initialization to debug and make more descriptive by @kylesayrs in #116
compressed-tensors main dependency for base-tests by @kylesayrs in #125
Set ready label for transformer tests; add message reminder on PR opened by @dsikka in #126
Fix markdown check test by @dsikka in #127
Naive Run Compressed Pt. 2 by @Satrat in #62
Fix transformer test conditions by @dsikka in #131
Run Compressed Tests by @Satrat in #132
Correct typo by @kylesayrs in #124
Activation Ordering Strategies by @kylesayrs in #121
Fix README Issue by @robertgshaw2-neuralmagic in #139
update by @dsikka in #143
Update finetune and oneshot tests by @dsikka in #114
Validate Recipe Parsing Output by @kylesayrs in #100
fix build error for nightly by @dhuangnm in #145
Fix recipe nested in configs by @kylesayrs in #140
MOE example with warning by @rahul-tuli in #87
Bug Fix: recipe stages were not being concatenated by @rahul-tuli in #150
fix package name bug for nightly by @dhuangnm in #155
Add descriptions for pytest marks by @kylesayrs in #156
Fix Sparsity Unit Test by @Satrat in #153
Fix: Error during model saving with shared tensors by @rahul-tuli in #158
Update 2:4 Examples by @dsikka in #161
DeepSeek: Fix Hessian Estimation by @Satrat in #157
bump up main to 0.2.0 by @dhuangnm in #163
Fix help dialogue by @kylesayrs in #151
Add MoE and Compressed Inference Examples by @Satrat in #160
Separate trust_remote_code args by @kylesayrs in #152
Enable a skipped finetune test by @dsikka in #169
Fix filename in example command by @dbarbuzzi in #173
Add DeepSeek V2.5 Example by @dsikka in #171
fix quality by @dsikka in #176
Patch log function name in gptq by @kylesayrs in #168
README for Modifiers by @Satrat in #165
Fix default for sequential updates by @dsikka in #186
fix default test case by @dsikka in #193
Fix Initalize typo by @Imss27 in #190
Update MoE examples by @mgoin in #192

New Contributors

@zzc0430 made their first contribution in #66
@horheynm made their first contribution in #71
@dsikka made their first contribution in #101
@dhuangnm made their first contribution in #145
@Imss27 made their first contribution in #190

Full Changelog: 0.1.0...0.2.0

Contributors

dbarbuzzi, mgoin, and 9 other contributors

Assets 4

12 Aug 15:37

dhuangnm

0.1.0

066d1e4

v0.1.0

What's Changed

Address Test Failures by @Satrat in #1
Remove SparseZoo Usage by @Satrat in #2
SparseML Cleanup by @markurtz in #6
Remove all references to Neural Magic copyright within LLM Compressor by @markurtz in #7
Add FP8 Support by @Satrat in #4
Fix Weekly Test Failure by @Satrat in #8
Add Scheme UX for QuantizationModifier by @Satrat in #9
Add Group Quantization Test Case by @Satrat in #10
Loguru logging standardization for LLM Compressor by @markurtz in #11
Clarify Function Names for Logging by @Satrat in #12
[ Examples ] E2E Examples by @robertgshaw2-neuralmagic in #5
Update setup.py by @robertgshaw2-neuralmagic in #15
SmoothQuant Mapping Defaults by @Satrat in #13
Initial README by @bfineran in #3
[Bug] Fix validation errors for smoothquant modifier + update examples by @rahul-tuli in #19
[MOE Quantization] Warn against "undercalibrated" modules by @dbogunowicz in #20
Port SparseML Remote Code Fix by @Satrat in #21
Update Quantization Save Defaults by @Satrat in #22
[Bugfix] Add fix to preserve modifier order when passed as a list by @rahul-tuli in #26
GPTQ - move calibration of quantiztion params to after hessian calibration by @bfineran in #25
Fix typos by @eldarkurtic in #31
Remove ceiling from datasets dep by @mgoin in #27
Revert naive compression format by @Satrat in #32
Fix layerwise targets by @Satrat in #36
Move Weight Update Out Of Loop by @Satrat in #40
Fix End Epoch Default by @Satrat in #39
Fix typos in example for w8a8 quant by @eldarkurtic in #38
Model Offloading Support Pt 2 by @Satrat in #34
set version to 1.0.0 for release by @bfineran in #44
Update version for first release by @markurtz in #50
BugFix: Update TRL example scripts to point to the right SFTTrainer by @rahul-tuli in #51
Update examples/quantization_24_sparse_w4a16 README by @dbarbuzzi in #52
Fix Failing Transformers Tests by @Satrat in #53
Offloading Bug Fix by @Satrat in #58

New Contributors

@markurtz made their first contribution in #6
@bfineran made their first contribution in #3
@dbogunowicz made their first contribution in #20
@eldarkurtic made their first contribution in #31
@mgoin made their first contribution in #27
@dbarbuzzi made their first contribution in #52

Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0

Contributors

dbarbuzzi, mgoin, and 7 other contributors

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's New in v0.3.0

Key Features and Improvements

Bug Fixes

Documentation

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: vllm-project/llm-compressor

v0.3.0

What's New in v0.3.0

Key Features and Improvements

Bug Fixes

Documentation

What's Changed

New Contributors

Contributors

v0.2.0

What's Changed

New Contributors

Contributors

v0.1.0

What's Changed

New Contributors

Contributors