You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using the preset W8A8 recipe from llm-compressor, the output results in a model config.json that fails validation when loaded by HF Transformers. This is a dev version of Transformers that includes the PRs from Neural Magic that support loading compressor models. The issue seems to be that quantization_config.config_groups.group_0.input_activations.observer is being set to None. When I diff the configs from the checkpoints NM has uploaded to HF hub, the main difference is that observer is set to "memoryless". Manually changing this seems to allow the models to load and evaluate properly.
File "REDACTED/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "REDACTED/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3604, in from_pretrained
config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
File "REDACTED/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 169, in merge_quantization_configs
quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
File "REDACTED/lib/python3.10/site-packages/transformers/quantizers/auto.py", line 99, in from_dict
return target_cls.from_dict(quantization_config_dict)
File "REDACTED/lib/python3.10/site-packages/transformers/utils/quantization_config.py", line 1158, in from_dict
return super().from_dict(config_dict, return_unused_kwargs=return_unused_kwargs, **kwargs)
File "REDACTED/lib/python3.10/site-packages/transformers/utils/quantization_config.py", line 101, in from_dict
config = cls(**config_dict)
File "REDACTED/lib/python3.10/site-packages/transformers/utils/quantization_config.py", line 1114, in __init__
self.quantization_config = QuantizationConfig.parse_obj(
File "REDACTED/lib/python3.10/site-packages/pydantic/main.py", line 1162, in parse_obj
return cls.model_validate(obj)
File "REDACTED/lib/python3.10/site-packages/pydantic/main.py", line 596, in model_validate
return cls.__pydantic_validator__.validate_python(
pydantic_core._pydantic_core.ValidationError: 2 validation errors for QuantizationConfig
config_groups.group_0.QuantizationScheme.input_activations.observer
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.9/v/string_type
Describe the bug
When using the preset W8A8 recipe from llm-compressor, the output results in a model config.json that fails validation when loaded by HF Transformers. This is a dev version of Transformers that includes the PRs from Neural Magic that support loading compressor models. The issue seems to be that
quantization_config.config_groups.group_0.input_activations.observer
is being set toNone
. When I diff the configs from the checkpoints NM has uploaded to HF hub, the main difference is that observer is set to"memoryless"
. Manually changing this seems to allow the models to load and evaluate properly.Pinging @robertgshaw2-neuralmagic at his request 😄
Errors
Additional context
my_config.json
Using llm-compressor v0.2.0 tag
The text was updated successfully, but these errors were encountered: