[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

dsikka · 2024-10-10T17:03:36Z

SUMMARY:

PR to add observers to llm-compressor
Adds the required hooks needed to run calibration as part of the QuantizationModifier. All required calibration lifecycle steps can now be found in calibration.py
Also adds the KV Cache object such that calibration can be done to update k_scale and v_scale for kv_cache quantization
Requires the following PR to land in compressed-tensors: Observer Restructure: Remove Observers, calibration, and applying frozen steps from lifecycle neuralmagic/compressed-tensors#189
Updated Calibration lifecycle (also shown in the docstrings). This will run as part of the calibration step within the QuantizationModifier


Run calibration if running input/output activation quantization or kv_cache quantization.

Calibration Lifecycle for a single torch.nn.Module:

      1. initialize_observer():
          if input/output activation:
              - observer = Observer.load_from_registry(...)
              - module.register_module(f"{base_name}_observer", observer)
              
      2. register_calibration_hooks():
          if input activation and not dynamic quant (used to call observers before intput QDQ):
              - pre_hook_handle = module.register_forward_pre_hook(calibrate_input_hook())
          if output activation and not dynamic quant (used to call observers before output QDQ):
              - post_hook_handle = module.register_forward_hook(calibrate_kv_cache_output_hook())
          if kv_cache quantization (used to set kv_cache to QuantizedKVParameterCache and update k_scale/v_scale)
              - pre_hook_handle = module.register_forward_pre_hook(calibrate_kv_cache_input_hook(), with_kwargs=True)
              - post_hook_handle = module.register_forward_hook(calibrate_kv_cache_output_hook())
          self.calibration_hooks.append(pre_hook_handle)
          self.calibration_hooks.append(post_hook_handle)

      3. self._calibrate(module) # run forward pass through model using calibration data
      4. set_unset_kv_cache() # remove kv_cache objects attached to attention layers  initially set in _apply_modifier_to_model
      5. remove calibration hooks in self.calibration_hooks_
      6. remove observers

Testing:

Tested w4a16, quantized kv_cache, and w8a8 int8 workflows

github-actions · 2024-10-10T17:03:49Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

src/llmcompressor/observers/base.py

rahul-tuli

I really like the new structure, Great work!

Left a few nits, would recommend revisiting the docstrings and updating them for consistency:
-> Start docstrings with a Capital Letter
-> Include param info in :params over just writing a description in the main docstring

Otherwise no big red flags! Good tests as well.

src/llmcompressor/observers/base.py

… steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <[email protected]> * clearer warning Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <[email protected]> * use Linear default Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George <[email protected]>

… steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <[email protected]> * clearer warning Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <[email protected]> * use Linear default Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

… steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * no cache context Signed-off-by: Kyle Sayers <[email protected]> * support mllamaconfig Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * add docstring Signed-off-by: Kyle Sayers <[email protected]> * make docstring runnable Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * print config for better debugging Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: George <[email protected]>

update functioon

7d6c73c

dsikka marked this pull request as draft October 10, 2024 17:03

dsikka mentioned this pull request Oct 10, 2024

[Observer Restructure]: Separate out scale/zp and observer init; separate out calibration from forward pass neuralmagic/compressed-tensors#188

Merged

wip

7dad592

dsikka changed the title ~~[Observer Restructure]: Update function call~~ [Observer Restructure]: Add Observers Oct 14, 2024

dsikka added 5 commits October 14, 2024 16:12

clean-up; fix imports

ece6451

clean-up

dbda873

more clean-up

d1a5756

bug fix

15597c3

update for kvcache

acdb8da

dsikka force-pushed the update-foward branch from 3544076 to acdb8da Compare October 17, 2024 19:33

dsikka added 3 commits October 17, 2024 20:53

get kv_cache to work

28c0167

docstring

841780d

fix comment

5e21639

dsikka force-pushed the update-foward branch from dad1442 to 5e21639 Compare October 18, 2024 00:32

dsikka added 3 commits October 18, 2024 01:32

fix condition for dynamic

de28cf8

Merge branch 'main' into update-foward

a3ddb6f

update

b0de448

dsikka force-pushed the update-foward branch from 8b2d430 to b0de448 Compare October 18, 2024 01:41

dsikka added 2 commits October 21, 2024 18:24

update tests

b739db8

add observer tests

ac00c9b

dsikka force-pushed the update-foward branch from 358e10a to ac00c9b Compare October 21, 2024 18:52

dsikka added 4 commits October 21, 2024 16:28

Merge branch 'main' into update-foward

a5eafad

add flake8 skip

a68694d

apply updated mse fixes

ab2d0a6

fix import

27284b8

dsikka changed the title ~~[Observer Restructure]: Add Observers~~ [Observer Restructure]: Add Observers, calibration, and frozen steps to lifecycle Oct 22, 2024

dsikka changed the title ~~[Observer Restructure]: Add Observers, calibration, and frozen steps to lifecycle~~ [Observer Restructure]: Add Observers; Add calibration, and frozen steps to QuantizationModifier Oct 22, 2024

dsikka changed the title ~~[Observer Restructure]: Add Observers; Add calibration, and frozen steps to QuantizationModifier~~ [Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier Oct 22, 2024

move hook check to observer call

92db43e

dsikka requested review from kylesayrs and horheynm October 28, 2024 13:35

Merge branch 'main' into update-foward

99a9376

horheynm reviewed Oct 30, 2024

View reviewed changes

src/llmcompressor/observers/base.py Outdated Show resolved Hide resolved

kylesayrs previously approved these changes Oct 30, 2024

View reviewed changes

rahul-tuli previously approved these changes Oct 30, 2024

View reviewed changes

src/llmcompressor/observers/base.py Outdated Show resolved Hide resolved

src/llmcompressor/observers/base.py Show resolved Hide resolved

update

9fc10a9

dsikka dismissed stale reviews from rahul-tuli and kylesayrs via 9fc10a9 October 30, 2024 21:49

Merge branch 'main' into update-foward

c4686d4

dsikka requested review from horheynm, kylesayrs and rahul-tuli October 30, 2024 21:51

separate out calibration step

e591528

kylesayrs approved these changes Oct 31, 2024

View reviewed changes

rahul-tuli approved these changes Oct 31, 2024

View reviewed changes

Merge branch 'main' into update-foward

031ba38

dsikka merged commit 18e9a9f into main Oct 31, 2024
6 of 7 checks passed

dsikka deleted the update-foward branch October 31, 2024 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

dsikka commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024

rahul-tuli left a comment

[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier #837

[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier #837

Conversation

dsikka commented Oct 10, 2024 • edited Loading

Testing:

github-actions bot commented Oct 10, 2024

rahul-tuli left a comment

Choose a reason for hiding this comment

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

dsikka commented Oct 10, 2024 •

edited

Loading