-
Basically as the title says: how can I view the integer-valued quantized weights per layer so I can export them for use outside of Openvino? I followed this tutorial: https://docs.openvino.ai/latest/notebooks/302-pytorch-quantization-aware-training-with-output.html |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@V0XNIHILI note that knowing quantized weight values themselves (the INT8 values) is not enough to specify the weights - you have to know the scale of quantization and its zero-point. Note also that this is not exactly the use case we are covering with NNCF right now. The import nncf.torch
import torch
from nncf.torch import create_compressed_model
from nncf import NNCFConfig
from nncf.torch.quantization.algo import QuantizationController
class MyWeightedModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv2d = torch.nn.Conv2d(5, 5, 1)
torch.nn.init.normal_(self.conv2d.weight)
def forward(self, x):
return self.conv2d(x)
config = NNCFConfig.from_dict({
"input_info": {"sample_size": [1, 5, 5, 5]},
"compression": {
"algorithm": "quantization",
"initializer": {
"range": {"num_init_samples": 0},
"batchnorm_adaptation": {"num_bn_adaptation_samples": 0}
}
},
"target_device": "TRIAL"
})
ctrl, model = create_compressed_model(MyWeightedModel(), config)
assert isinstance(ctrl, QuantizationController)
for w_qid, w_qinfo in ctrl.weight_quantizers.items():
weight = w_qinfo.quantized_module.weight
quantizer = w_qinfo.quantizer_module_ref
qmin, qmax, scale, zp = quantizer.get_parameters_for_torch_fq()
scale = scale.reshape(quantizer.scale_shape)
zp = zp.reshape(quantizer.scale_shape)
qweight = zp + (weight / scale).round().clip(min=qmin, max=qmax)
print(f"layer:{w_qid.target_node_name}\nscale: {scale}\nzp: {zp}\nqmin: {qmin} qmax: {qmax}\nqweight:{qweight}\n\n") Output:
The |
Beta Was this translation helpful? Give feedback.
-
If you convert the quantized model into OpenVINO IR the weight will be quantized while some operations such as BatchNorms are fused. You can see the constants in the Netron tool, for example. |
Beta Was this translation helpful? Give feedback.
@V0XNIHILI note that knowing quantized weight values themselves (the INT8 values) is not enough to specify the weights - you have to know the scale of quantization and its zero-point. Note also that this is not exactly the use case we are covering with NNCF right now.
The
nncf.torch.quantization.algo.QuantizationController
object should have all the info you need to build a quantized representation. Here's some code to get you started (would only work at the develop branch right now):