Skip to content

Releases: vllm-project/llm-compressor

v0.3.0

13 Nov 05:22
93832a6
Compare
Choose a tag to compare

What's New in v0.3.0

Key Features and Improvements

  • GPTQ Quantized-weight Sequential Updating (#177): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
  • Auto-Infer Mappings for SmoothQuantModifier (#119): Automatically infers mappings based on model architecture, making SmoothQuant easier to apply across various models.
  • Improved Sparse Compression Usability (#191): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
  • Generic Wrapper for Any Hugging Face Model (#185): Added wrap_hf_model_class utility, enabling better support and integration for Hugging Face models i.e. not based on AutoModelForCausalLM.
  • Observer Restructure (#837): Introduced calibration and frozen steps within QuantizationModifier, moving Observers from compressed-tensors to llm-compressor.

Bug Fixes

  • Fix Tied Tensors Bug (#659)
  • Observer Initialization in GPTQ Wrapper (#883)
  • Sparsity Reload Testing (#882)

Documentation

  • Updated SmoothQuant Tutorial (#115): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.

What's Changed

New Contributors

Full Changelog: 0.2.0...0.3.0

v0.2.0

23 Sep 22:24
2e0035f
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.1.0...0.2.0

v0.1.0

12 Aug 15:37
066d1e4
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/vllm-project/llm-compressor/commits/0.1.0