Skip to content

WangLanxiao/CrowdCaption_benchmark

Repository files navigation

What Happens in Crowd Scenes: A New Dataset about Crowd Scenes for Image Captioning

Introduction

This is the code for Crowd Scenes Captioning task based on xmodaler.

The original paper can be found here.

Installation

See installation instructions.

Requiremenets

  • Linux or macOS with Python >= 3.6
  • PyTorch and torchvision that matches the PyTorch installation. Install them together at pytorch.org to make sure of this
  • fvcore
  • pytorch_transformers
  • jsonlines
  • pycocotools

Getting Started

See Getting Started with X-modaler

CrowdCaption Preparation

1 Introducion: Official introduction.

2 Feature: You can download our feature (npy file) here, including faster-rcnn, swin-transformer, hrnet. Please put it into

 ./open_source_dataset/crowdscenes_caption/features

3 Annotation: You can download here. Please put it into

./open_source_dataset/crowdscenes_caption

4 Evaluation: You can download here or use official evaluation code. Please put it into

./cococaption

Acess code:6826

Training & Evaluation

Assume that you are under the root directory of this project, and you have activated your virtual environment if needed, and with crowdcaption dataset in 'open_source_dataset/crowdscenes_caption'. Here, we use 8GPUs.

# for xe training
bash train.sh

# for reward training
bash train_rl.sh

# for test
bash test.sh

Training and inference for other datasets in different config files are similar to the above description.

Performance and Trained Models

The performance and trained models will be released soon, please wait...

Acknowledgement

Thanks Xmodaler team for the wonderful open source project!

Citition

If you find the mmdetection-ref toolbox useful in your research, please consider citing:

@article{wang2022happens,
  title={What Happens in Crowd Scenes: A New Dataset about Crowd Scenes for Image Captioning},
  author={Wang, Lanxiao and Li, Hongliang and Hu, Wenzhe and Zhang, Xiaoliang and Qiu, Heqian and Meng, Fanma and Wu, Qingbo},
  journal={IEEE Transactions on Multimedia},
  year={2022},
  publisher={IEEE}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published