Check for config hidden size #840

kylesayrs · 2024-10-11T20:37:54Z

Purpose

Better error handling for model configs which do not specify hidden_size

Changes

Add explicit test for attribute

github-actions · 2024-10-11T20:40:36Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Signed-off-by: Kyle Sayers <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <[email protected]> * clearer warning Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <[email protected]> * use Linear default Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <[email protected]> * clearer warning Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <[email protected]> * use Linear default Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * no cache context Signed-off-by: Kyle Sayers <[email protected]> * support mllamaconfig Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * add docstring Signed-off-by: Kyle Sayers <[email protected]> * make docstring runnable Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * print config for better debugging Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: George <[email protected]>

check for config hidden size

0675345

mgoin approved these changes Oct 11, 2024

View reviewed changes

mgoin merged commit b3c6d90 into main Oct 11, 2024
6 of 7 checks passed

mgoin deleted the check-hidden_size branch October 11, 2024 22:24

kylesayrs added a commit that referenced this pull request Oct 23, 2024

check for config hidden size (#840)

89fe1df

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added a commit that referenced this pull request Nov 19, 2024

check for config hidden size (#840)

4d26e75

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added a commit that referenced this pull request Nov 21, 2024

check for config hidden size (#840)

8cfb8aa

Signed-off-by: Kyle Sayers <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for config hidden size #840

Check for config hidden size #840

kylesayrs commented Oct 11, 2024

github-actions bot commented Oct 11, 2024

Check for config hidden size #840

Check for config hidden size #840

Conversation

kylesayrs commented Oct 11, 2024

Purpose

Changes

github-actions bot commented Oct 11, 2024