Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
- Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo
MS COCO
Model | Test Size | APtest | AP50test | AP75test | batch 1 fps | batch 32 average time |
---|---|---|---|---|---|---|
YOLOv7 | 640 | 51.4% | 69.7% | 55.9% | 161 fps | 2.8 ms |
YOLOv7-X | 640 | 53.1% | 71.2% | 57.8% | 114 fps | 4.3 ms |
YOLOv7-W6 | 1280 | 54.9% | 72.6% | 60.1% | 84 fps | 7.6 ms |
YOLOv7-E6 | 1280 | 56.0% | 73.5% | 61.2% | 56 fps | 12.3 ms |
YOLOv7-D6 | 1280 | 56.6% | 74.0% | 61.8% | 44 fps | 15.0 ms |
YOLOv7-E6E | 1280 | 56.8% | 74.4% | 62.1% | 36 fps | 18.7 ms |
Docker environment (recommended)
Expand
# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov7 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolov7 --shm-size=64g nvcr.io/nvidia/pytorch:21.08-py3
# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx
# pip install required packages
pip install seaborn thop
# go to code folder
cd /yolov7
yolov7.pt
yolov7x.pt
yolov7-w6.pt
yolov7-e6.pt
yolov7-d6.pt
yolov7-e6e.pt
python test.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
You will get the results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.51206
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.69730
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.55521
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35247
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.55937
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66693
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.38453
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.63765
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.68772
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.53766
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.73549
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.83868
To measure accuracy, download COCO-annotations for Pycocotools to the ./coco/annotations/instances_val2017.json
Data preparation
bash scripts/get_coco.sh
- Download MS COCO dataset images (train, val, test) and labels. If you have previously used a different version of YOLO, we strongly recommend that you delete
train2017.cache
andval2017.cache
files, and redownload labels
Single GPU training
# train p5 models
python train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
# train p6 models
python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yaml
Multiple GPU training
# train p5 models
python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch-size 128 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
# train p6 models
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_aux.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch-size 128 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yaml
This sections illustrates the changes added to main code in this fork.
There are three main additions:
- Custom augmentations: This permits the addition of new augmentations not present in the yolov7 pipeline.
- Excluding bounding boxes: This enables the training process to remove specific classes from the images.
- Biasing data using labels: This allows the user to select what classes are going to appear more freqently in the training.
- A Squeeze and Excitation Block has been added, which can be used between the neck and head.
The first three additions can be used in the code as follows:
from vyn_yolov7 import Setting
from train import run_train
# relative ratio of the classes. For instance, imagine we have two classes: class1 and class2.
# If we want the class2 to be picked twice as class1, then we do
probabilities = {
'class1': 1.0,
'class2': 2.0,
}
options = Setting()
options.shuffle_class = True
options.probabilities = probabilities
options.custom_augment_fun = custom_augmentations
options.data = 'PATH/TO/YAML_FILE.yaml'
run_train(options)
The custom augmentation is a function that receives the image and bounding boxes as numpy arrays and returns the modified image and bounding boxes
An illustration of an augmentation function is presented below:
def custom_augmentation(image, bounding_boxes):
.
.
.
return augmented_image, augmented_bounding_boxes
In order to perform the removal of bounding boxes from images the fields excluded_classes
and mapping_classes
from the yaml are used. The purpose of this addition is the following: Imagine that we have a small dataset of people.
In this dataset, each individual person is labelled, but in the dataset there are also hands, heads and other parts of
the human body, but very few of them. In order to get an initial model, it would be convenient not to use these objects,
but we do want to keep using those images because complete persons may be in those images as well.
To solve this issue, this code allows the user to select a set of classes that are going to be removed, meaning the objects in the images will be replaced with random noise.
The bounding boxes in training and validation contain the classes that are being represented.
These classes are going to be integers from 0 to nc_complete
, in contrast to nc
as in the original code.
The nc
represents the number of classes that are going to be detected whereas nc_complete
is the total number of
classes in the dataset.
An example of these variables is shown below:
The nc_complete
must be passed to the yaml file together with the other two parameters. An example of this yaml file is:
excluded_classes:
- 3
mapping_classes:
0: 0
1: 1
2: 2
names:
- fire_extinguisher_shell
- forklift
- ladder
nc: 3
nc_complete: 4
train: PATH/TO/training.txt
val: PATH/TO/validation.txt
This means that in the dataset there will be 4 classes, the first three are normal ones and they will work in the same way as the original yolov7 code, but the fourth class (index number 3) will be excluded, so the bounding box will not be used for training, but the section of the image inside that bounding box will be changed for noise.
Lastly, the shuffle_class
and probabilities
are used to bias the dataset when needed.
It allows to pass the relative rate of each class.
For instance, if we have 3 classes: barrier
, manhole
and water_barrier
and we want the class manhole
to
be used twice as much as the other two since this class seems to be harder for the model to train it properly. Then,
probabilities = {'barrier': 1, 'manhole': 2, 'water_barrier': 1}
Notice the shuffle_class
is used to radomly select data from a class or to use all the dataset as is.
So, if shuffle_class
is False (default behaviour and the only one in the original code) then all the
dataset will be used, for instance, if there are 100 images (80 of class barrier
and 20 of the rests)
the 100 images will be used per epoch. When shuffle_class
is True, the training
will select each class with equal probability regardless of the number of images of each class. So, 'barrier',
'manhole', 'water_barrier' will be selected with the same probability even when 'barrier' is more common.
If probabilities
is provided then the classes will be selected following that rate instead of with
equal probability.
yolov7_training.pt
yolov7x_training.pt
yolov7-w6_training.pt
yolov7-e6_training.pt
yolov7-d6_training.pt
yolov7-e6e_training.pt
Single GPU finetuning for custom dataset
# finetune p5 models
python train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml
# finetune p6 models
python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/custom.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6-custom.yaml --weights 'yolov7-w6_training.pt' --name yolov7-w6-custom --hyp data/hyp.scratch.custom.yaml
On video:
python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source yourvideo.mp4
On image:
python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg
Pytorch to CoreML (and inference on MacOS/iOS)
Pytorch to ONNX with NMS (and inference)
python export.py --weights yolov7-tiny.pt --grid --end2end --simplify \
--topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640
Pytorch to TensorRT with NMS (and inference)
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
python export.py --weights ./yolov7-tiny.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
git clone https://github.com/Linaom1214/tensorrt-python.git
python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16
Pytorch to TensorRT another way
Expand
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
python export.py --weights yolov7-tiny.pt --grid --include-nms
git clone https://github.com/Linaom1214/tensorrt-python.git
python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16
# Or use trtexec to convert ONNX to TensorRT engine
/usr/src/tensorrt/bin/trtexec --onnx=yolov7-tiny.onnx --saveEngine=yolov7-tiny-nms.trt --fp16
Tested with: Python 3.7.13, Pytorch 1.12.0+cu113
See keypoint.ipynb.
See instance.ipynb.
YOLOv7 for instance segmentation (YOLOR + YOLOv5 + YOLACT)
Model | Test Size | APbox | AP50box | AP75box | APmask | AP50mask | AP75mask |
---|---|---|---|---|---|---|---|
YOLOv7-seg | 640 | 51.4% | 69.4% | 55.8% | 41.5% | 65.5% | 43.7% |
YOLOv7 with decoupled TAL head (YOLOR + YOLOv5 + YOLOv6)
Model | Test Size | APval | AP50val | AP75val |
---|---|---|---|---|
YOLOv7-u6 | 640 | 52.6% | 69.7% | 57.3% |
@inproceedings{wang2023yolov7,
title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023}
}
@article{wang2023designing,
title={Designing Network Design Strategies Through Gradient Path Analysis},
author={Wang, Chien-Yao and Liao, Hong-Yuan Mark and Yeh, I-Hau},
journal={Journal of Information Science and Engineering},
year={2023}
}
YOLOv7-semantic & YOLOv7-panoptic & YOLOv7-caption
YOLOv7-semantic & YOLOv7-detection & YOLOv7-depth (with NTUT)
YOLOv7-3d-detection & YOLOv7-lidar & YOLOv7-road (with NTUT)
Expand
- https://github.com/AlexeyAB/darknet
- https://github.com/WongKinYiu/yolor
- https://github.com/WongKinYiu/PyTorch_YOLOv4
- https://github.com/WongKinYiu/ScaledYOLOv4
- https://github.com/Megvii-BaseDetection/YOLOX
- https://github.com/ultralytics/yolov3
- https://github.com/ultralytics/yolov5
- https://github.com/DingXiaoH/RepVGG
- https://github.com/JUGGHM/OREPA_CVPR2022
- https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose