Skip to content

Latest commit

 

History

History
76 lines (62 loc) · 4.46 KB

README.md

File metadata and controls

76 lines (62 loc) · 4.46 KB

YoloV7 Quantization Aware Training

Description

We use TensorRT's pytorch quntization tool to finetune training QAT yolov7 from the pre-trained weight, then export the model to onnx and deploy it with TensorRT. The accuray and performance can be found in below table.

Method Calibration method mAPval
0.5
mAPval
0.5:0.95
batch-1 fps
Jetson Orin-X
batch-16 fps
Jetson Orin-X
weight
pytorch FP16 - 0.6972 0.5120 - - yolov7.pt
pytorch PTQ-INT8 Histogram(MSE) 0.6957 0.5100 - - yolov7_ptq.pt yolov7_ptq_640.onnx
pytorch QAT-INT8 Histogram(MSE) 0.6961 0.5111 - - yolov7_qat.pt
TensorRT FP16 - 0.6973 0.5124 140 168 yolov7.onnx
TensorRT PTQ-INT8 TensorRT built in EntropyCalibratorV2 0.6317 0.4573 207 264 -
TensorRT QAT-INT8 Histogram(MSE) 0.6962 0.5113 207 266 yolov7_qat_640.onnx
  • network input resolution: 3x640x640
  • note: trtexec cudaGraph is enabled

How To QAT Training

1.Setup

Suggest to use docker environment.

$ docker pull nvcr.io/nvidia/pytorch:22.09-py3
  1. Clone and apply patch
# use this YoloV7 as a sample base 
git clone https://github.com/WongKinYiu/yolov7.git
cp -r yolov_deepstream/yolov7_qat/* yolov7/
  1. Install dependencies
$ pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
  1. Download dataset and pretrained model
$ bash scripts/get_coco.sh
$ wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

2. Start QAT training

$ python scripts/qat.py quantize yolov7.pt --ptq=ptq.pt --qat=qat.pt --eval-ptq --eval-origin

This script includes steps below:

  • Insert Q&DQ nodes to get fake-quant pytorch model
    Pytorch quntization tool provides automatic insertion of QDQ function. But for yolov7 model, it can not get the same performance as PTQ, because in Explicit mode(QAT mode), TensorRT will henceforth refer Q/DQ nodes' placement to restrict the precision of the model. Some of the automatic added Q&DQ nodes can not be fused with other layers which will cause some extra useless precision convertion. In our script, We find Some rules and restrictions for yolov7, QDQ nodes are automatically analyzed and configured in a rule-based manner, ensuring that they are optimal under TensorRT. Ensuring that all nodes are running INT8(confirmed with tool:trt-engine-explorer, see scripts/draw-engine.py). for details of this part, please refer quantization/rules.py, About the guidance of Q&DQ insert, please refer Guidance_of_QAT_performance_optimization

  • PTQ calibration
    After inserting Q&DQ nodes, we recommend to run PTQ-Calibration first. Per experiments, Histogram(MSE) is the best PTQ calibration method for yolov7. Note: if you are satisfied with PTQ result, you could also skip QAT.

  • QAT training
    After QAT, need to finetune traning our model. after getting the accuracy we are satisfied, Saving the weights to files

3. Export onnx

$ python scripts/qat.py export qat.pt --size=640 --save=qat.onnx --dynamic

4. Evaluate model accuracy on coco

$ bash scripts/eval-trt.sh qat.pt

5. Benchmark

$ /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --fp16  --workspace=1024000 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4x3x640x640

Quantization Yolov7-Tiny

$ python scripts/qat.py quantize yolov7-tiny.pt --qat=qat.pt --ptq=ptq.pt --ignore-policy="model\.77\.m\.(.*)|model\.0\.(.*)" --supervision-stride=1 --eval-ptq --eval-origin