Skip to content
Rachel Gardner edited this page Sep 16, 2017 · 1 revision

Pretrained YOLO (PASCAL VOC 2012)

In case the node is launched with post_proc:=YOLO argument, then a special YOLO post-processing is applied. In such case, the node publishes output of the DNN using standard Image message in a certain format: the output is a 2D, single-channel "image" that has the following format: WxHx1 (so encoding == 32FC1) where W is fixed and equals 6, and H is equal to the number of detected objects. For example, if the DNN has detected 2 objects, then the output is 6x2 image. For each detected object, the 6 values are the following:

0  : label (class) of the detected object (e.g. person or a dog).
1  : probability of this object.
2,3: x and y coordinates of the top left corner of the object in image coordinates.
4,5: width and height of the object in image coordinates.

All values are 32-bit floats, including label. Label indices correspond to 20 classes from PASCAL VOC 2012 dataset.

For example, if DNN detected a person (label:14) and a dog (label: 12) in the image with dimensions 320x180 then the output might look something like that:

Label Prob X Y Width Height
14.0 0.5 120.0 80.0 30.0 60.0
12.0 0.4 160.0 115.0 40.0 20.0

Reference Models

Clone this wiki locally