Prepare datasets for Open Vocabulary Detection

COCO Open Vocabulary

This is consistent to OCD.

The annotations for OVD training instances_train2017_seen_2_oriorder.json, instances_train2017_seen_2_oriorder_cat_info.json and evaluation instances_val2017_all_2_oriorder.json, and annotations for image-level supervision captions_train2017_tags_allcaps_pis.json can be downloaded from here.

The CLIP image features on class-agnostic MAVL proposals for region-based knowledge distillation (RKD) and class-specific proposals for pseudo image-level supervision (PIS) can be downloaded from here. Untar the file coco_props.tar.gz and place in the corresponding location as shown below:

coco/
    train2017/
    val2017/
    annotations/
        captions_train2017.json
        instances_train2017.json 
        instances_val2017.json
        captions_train2017_tags_allcaps_pis.json
    zero-shot/
        instances_train2017_seen_2_oriorder.json
        instances_val2017_all_2_oriorder.json
        instances_train2017_seen_2_oriorder_cat_info.json
MAVL_proposals
    coco_props/
        classagnostic_distilfeats/
            000000581921.pkl
            000000581929.pkl
            ....
        class_specific/
            000000581921.pkl
            000000581929.pkl
            ....

Otherwise, follow the following instructions to generate them from the COCO standard annotations. We follow the code-base of Detic for the dataset preperation.

COCO annotations

Following the work of OVR-CNN, we first create the open-vocabulary COCO split. The converted files should be placed as shown below,

coco/
    zero-shot/
        instances_train2017_seen_2.json
        instances_val2017_all_2.json

These annotations are then pre-processed for easier evaluation, using the following commands:

python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_train2017_seen_2.json
python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_val2017_all_2.json

Annotations for Image-level labels from COCO-Captions

The final annotation to prepare is captions_train2017_tags_allcaps_pis.json. For the Image-level supervision, we use the COCO-captions annotations and filter them with the MAVL predictions. First generate the Image-level labels from COCO-captions with the command,

python tools/get_cc_tags.py --cc_ann datasets/coco/annotations/captions_train2017.json 
    --out_path datasets/coco/annotations/captions_train2017_tags_allcaps.json  
    --allcaps --convert_caption --cat_path datasets/coco/annotations/instances_val2017.json

This creates datasets/coco/captions_train2017_tags_allcaps.json.

To ignore the remaining classes from the COCO that are not included in seen(65)+ unseen(17), instances_train2017_seen_2_oriorder_cat_info.json is used by the flag IGNORE_ZERO_CATS. This is created by

python tools/get_lvis_cat_info.py --ann datasets/coco/zero-shot/instances_train2017_seen_2_oriorder.json

Proposals for PIS

With the Image-level labels from COCO-captions captions_train2017_tags_allcaps_pis.json, generate pseudo-proposals for PIS using MAVL with the below command. Download the checkpoints from the external submodule.

python tools/get_ils_labels.py -ckpt <mavl pretrained weights path> -dataset coco -dataset_dir datasets/coco
        -output datasets/MAVL_proposals/coco_props/class_specific

The class-specific pseudo-proposals will be stored as individual pickle files for each image.

Filter the Image-level annotation captions_train2017_tags_allcaps.json from COCO-captions with the pseudo-proposals from MAVL with the command,

python tools/update_cc_pis_annotations.py --ils_path datasets/MAVL_proposals/coco_props/class_specific 
    --cc_ann_path datasets/coco/annotations/captions_train2017_tags_allcaps.json 
    --output_ann_path datasets/coco/annotations/captions_train2017_tags_allcaps_pis.json

This creates datasets/coco/captions_train2017_tags_allcaps_pis.json.

Regions and CLIP embeddings for RKD

Generate the class-agnostic proposals for the images in the training set. The class-agnostic pseudo-proposals and corresponding CLIP images features will be stored as individual pickle files for each image. Note: It does not depend on steps 2 and 3.

python tools/get_rkd_clip_feat.py -ckpt <mavl pretrained weights path> -dataset coco -dataset_dir datasets/coco
        -output datasets/MAVL_proposals/coco_props/classagnostic_distilfeats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATASETS.md

DATASETS.md

Prepare datasets for Open Vocabulary Detection

COCO Open Vocabulary

Files

DATASETS.md

Latest commit

History

DATASETS.md

File metadata and controls

Prepare datasets for Open Vocabulary Detection

COCO Open Vocabulary