Release v0.3.0 · vllm-project/llm-compressor

What's New in v0.3.0

Key Features and Improvements

GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

Fix Tied Tensors Bug (#659)
Observer Initialization in GPTQ Wrapper (#883)
Sparsity Reload Testing (#882)

Documentation

Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

Fix compresed typo by @kylesayrs in #188
GPTQ Quantized-weight Sequential Updating by @kylesayrs in #177
Add: targets and ignore inference for sparse compression by @rahul-tuli in #191
switch tests from weekly to nightly by @dhuangnm in #658
Compression wrapper abstract methods by @kylesayrs in #170
Explicitly set sequential_update in examples by @kylesayrs in #187
Increase Sparsity Threshold for compressors by @rahul-tuli in #679
Add a generic wrap_hf_model_class utility to support VLMs by @mgoin in #185
Add tests for examples by @dbarbuzzi in #149
Rename to quantization config by @kylesayrs in #730
Implement Missing Modifier Methods by @kylesayrs in #166
Fix 2/4 GPTQ Model Tests by @dsikka in #769
SmoothQuant mappings tutorial by @rahul-tuli in #115
Fix import of ModelCompressor by @rahul-tuli in #776
update test by @dsikka in #773
[Bugfix] Fix saving offloaded state dict by @kylesayrs in #172
Auto-Infer mappings Argument for SmoothQuantModifier Based on Model Architecture by @rahul-tuli in #119
Update workflows/actions by @dbarbuzzi in #774
[Bugfix] Prepare KD Models when Saving by @kylesayrs in #174
Set Sparse compression to save_compressed by @rahul-tuli in #821
Install compressed-tensors after llm-compressor by @dbarbuzzi in #825
Fix test typo by @kylesayrs in #828
Add AutoModelForCausalLM example by @dsikka in #698
[Bugfix] Workaround tied tensors bug by @kylesayrs in #659
Only untie word embeddings by @kylesayrs in #839
Check for config hidden size by @kylesayrs in #840
Use float32 for Hessian dtype by @kylesayrs in #847
GPTQ: Depreciate non-sequential update option by @kylesayrs in #762
Typehint nits by @kylesayrs in #826
[ DOC ] Remove version restrictions in W8A8 exmaple by @miaojinc in #849
Fix inconsistence in example config of 2:4 sparse quantization by @yzlnew in #80
Fix forward function pass call by @dsikka in #845
[Bugfix] Use weight parameter of linear layer by @kylesayrs in #836
[Bugfix] Rename files to remove colons by @kylesayrs in #846
cover all 3.9-3.12 in commit testing by @dhuangnm in #864
Add marlin-24 recipe/configs for e2e testing by @dsikka in #866
[Bugfix] onload during sparsity calculation by @kylesayrs in #862
Fix HFTrainer overloads by @kylesayrs in #869
Support Model Offloading Tied Tensors Patch by @kylesayrs in #872
Add advice about dealing with non-invertable hessians by @kylesayrs in #875
seed commit workflow by @andy-neuma in #877
[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier by @dsikka in #837
Bugfix observer initialization in gptq_wrapper by @rahul-tuli in #883
BugFix: Fix Sparsity Reload Testing by @dsikka in #882
Use custom unique test names for e2e tests by @dbarbuzzi in #892
Revert "Use custom unique test names for e2e tests (#892)" by @dsikka in #893
Move config["testconfig_path"] assignment by @dbarbuzzi in #895
Cap accelerate version to avoid bug by @kylesayrs in #897
Fix observing offloaded weight by @kylesayrs in #896
Update image in README.md by @mgoin in #861
update accelerate version by @kylesayrs in #899
[GPTQ] Iterative Parameter Updating by @kylesayrs in #863
Small fixes for release by @dsikka in #901
use smaller portion of dataset by @dsikka in #902
Update example to not fail hessian inversion by @dsikka in #904
Bump version to 0.3.0 by @dsikka in #907

New Contributors

@miaojinc made their first contribution in #849
@yzlnew made their first contribution in #80
@andy-neuma made their first contribution in #877

Full Changelog: 0.2.0...0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

What's New in v0.3.0

Key Features and Improvements

Bug Fixes

Documentation

What's Changed

New Contributors

Contributors