Assume next structure of data:
|-- data_dir |-- images |-- video_1 frame_000000.png frame_000001.png |-- video_2 frame_000000.png frame_000001.png |-- video_3 frame_000000.png frame_000001.png |-- annotation annotation_file_1.xml annotation_file_2.xml annotation_file_3.xml train_tasks.txt test_tasks.txt
Each annotation file (see this header) describes a single source of images (see this header).
For annotating it's better to use CVAT utility. So we assume that annotation file is stored in appropriate .xml
format. In annotation file we have single independent track for each person on video which includes of bounding box description on each frame. General structure of annotation file:
|-- root |-- track_0 bounding_box_0 bounding_box_1 |-- track_1 bounding_box_0 bounding_box_1
Toy example of annotation file:
<?xml version="1.0" encoding="utf-8"?>
<annotations count="1">
<track id="0" label="person">
<box frame="0" xtl="1.0" ytl="1.0" xbr="0.0" ybr="0.0" occluded="0">
<attribute name="action">action_name</attribute>
</box>
</track>
</annotations>
where fields have next description:
count
- number of tracksid
- unique ID of track in filelabel
- label of track (data loader will skips all other labels exceptperson
)frame
- unique ID of frame in trackxtl
,ytl
,xbr
,ybr
- bounding box coordinates of top-left and bottom-right cornersoccluded
- marker to highlight heavy occluded bounding boxes (can be skipped during training)name
- name of bounding box attribute (data loader is sensitive foraction
class only)action_name
- valid name of action (you can define own list of actions)
Our implementation of data loader works with independent images stored on the drive. Each image should be named in format frame_xxxxxx.png
or frame_xxxxxx.jpg
(where xxxxxx
is unique image number).
NOTE To extract images from the video you can use tools/data/dump_frames.py
For more robust control of image sources we have created separate file where each row represents a single source in next format: annotation_file_path.xml image_height,image_width images_directory_path
. We assume that all images from the same source are resize to image_height,image_width
sizes (it needs to properly decode annotations).
Example of train_tasks.txt
file:
annotations/annotation_file_1.xml 1920,1080 images/video1
annotations/annotation_file_2.xml 1920,1080 images/video2
Example of test_tasks.txt
file:
annotations/annotation_file_3.xml 1920,1080 images/video3
To generate the final data file (train or test) run the command:
python2 tools/data/prepare_pedestrian_db.py -t <PATH_TO_TASKS> \ # path to file with tasks
-o <PATH_TO_OUTPUT_DIR> \ # output directory
The output directory structure (some example of script output you can find in ./dataset
folder):
|-- root |-- annotation |-- video_1 sample_000000.json sample_000000.json |-- video_2 sample_000000.json sample_000000.json data.txt class_map.yml
Generated files:
data.txt
file should be used as input for the train/eval scripts.class_map.txt
file will include generate mapping from class names onto class IDs.
Note 1 To specify class IDs directly you can set -i
key: -i <PATH_TO_CLASS_MAP>
(see example tools/data/pedestriandb_class_map.yml
). If you specify own class mapping than the class_map.txt
file will not be generated.
Note 2 To generate valid class mapping for testing purpose you should set -i <PATH_TO_CLASS_MAP>
, where <PATH_TO_CLASS_MAP>
is generated by script class_map.txt
file or your own class mapping file. Otherwise order of class IDs will be different.
Note 3 You can use prepared toy dataset (./dataset
folder) to start you model training. You only need to specify the full path to images (./dataset/images
folder) in data.txt
file.
For the generated dataset you should set the correct field values in appropriate config file:
IMAGE_SIZE
- target image size in format[height, width, num_channels]
TRAIN_DATA_SIZE
- number training samplesVAL_DATA_SIZE
- number testing samplesMAX_NUM_DETECTIONS_PER_IMAGE
- max number of objects on single image (if it's more than subset of objects will be used)