Skip to content

Latest commit

 

History

History
113 lines (98 loc) · 4.99 KB

File metadata and controls

113 lines (98 loc) · 4.99 KB

Data preparation

Assume next structure of data:

    |-- data_dir
         |-- images
            |-- video_1
                frame_000000.png
                frame_000001.png
            |-- video_2
                frame_000000.png
                frame_000001.png
            |-- video_3
                frame_000000.png
                frame_000001.png
         |-- annotation
            annotation_file_1.xml
            annotation_file_2.xml
            annotation_file_3.xml
         train_tasks.txt
         test_tasks.txt

Each annotation file (see this header) describes a single source of images (see this header).

Annotation file format

For annotating it's better to use CVAT utility. So we assume that annotation file is stored in appropriate .xml format. In annotation file we have single independent track for each person on video which includes of bounding box description on each frame. General structure of annotation file:

    |-- root
         |-- track_0
              bounding_box_0
              bounding_box_1
         |-- track_1
              bounding_box_0
              bounding_box_1

Toy example of annotation file:

<?xml version="1.0" encoding="utf-8"?>
<annotations count="1">
    <track id="0" label="person">
        <box frame="0" xtl="1.0" ytl="1.0" xbr="0.0" ybr="0.0" occluded="0">
            <attribute name="action">action_name</attribute>
        </box>
    </track>
</annotations>

where fields have next description:

  • count - number of tracks
  • id - unique ID of track in file
  • label - label of track (data loader will skips all other labels except person)
  • frame - unique ID of frame in track
  • xtl, ytl, xbr, ybr - bounding box coordinates of top-left and bottom-right corners
  • occluded - marker to highlight heavy occluded bounding boxes (can be skipped during training)
  • name - name of bounding box attribute (data loader is sensitive for action class only)
  • action_name - valid name of action (you can define own list of actions)

Image file format

Our implementation of data loader works with independent images stored on the drive. Each image should be named in format frame_xxxxxx.png or frame_xxxxxx.jpg (where xxxxxx is unique image number).

NOTE To extract images from the video you can use tools/data/dump_frames.py

Tasks file format

For more robust control of image sources we have created separate file where each row represents a single source in next format: annotation_file_path.xml image_height,image_width images_directory_path. We assume that all images from the same source are resize to image_height,image_width sizes (it needs to properly decode annotations).

Example of train_tasks.txt file:

annotations/annotation_file_1.xml 1920,1080 images/video1
annotations/annotation_file_2.xml 1920,1080 images/video2

Example of test_tasks.txt file:

annotations/annotation_file_3.xml 1920,1080 images/video3

Train/eval data file generation

To generate the final data file (train or test) run the command:

python2 tools/data/prepare_pedestrian_db.py -t <PATH_TO_TASKS> \      # path to file with tasks
                                            -o <PATH_TO_OUTPUT_DIR> \ # output directory

The output directory structure (some example of script output you can find in ./dataset folder):

    |-- root
         |-- annotation
              |-- video_1
                sample_000000.json
                sample_000000.json
              |-- video_2
                sample_000000.json
                sample_000000.json
         data.txt
         class_map.yml

Generated files:

  • data.txt file should be used as input for the train/eval scripts.
  • class_map.txt file will include generate mapping from class names onto class IDs.

Note 1 To specify class IDs directly you can set -i key: -i <PATH_TO_CLASS_MAP> (see example tools/data/pedestriandb_class_map.yml). If you specify own class mapping than the class_map.txt file will not be generated.

Note 2 To generate valid class mapping for testing purpose you should set -i <PATH_TO_CLASS_MAP>, where <PATH_TO_CLASS_MAP> is generated by script class_map.txt file or your own class mapping file. Otherwise order of class IDs will be different.

Note 3 You can use prepared toy dataset (./dataset folder) to start you model training. You only need to specify the full path to images (./dataset/images folder) in data.txt file.

Config specification

For the generated dataset you should set the correct field values in appropriate config file:

  • IMAGE_SIZE - target image size in format [height, width, num_channels]
  • TRAIN_DATA_SIZE - number training samples
  • VAL_DATA_SIZE - number testing samples
  • MAX_NUM_DETECTIONS_PER_IMAGE - max number of objects on single image (if it's more than subset of objects will be used)