Skip to content

Latest commit

 

History

History
46 lines (40 loc) · 1.73 KB

README.md

File metadata and controls

46 lines (40 loc) · 1.73 KB

MUGEN Baseline

Project Page | Paper

Install

Please run the following command to setup the environment.

conda create --name mugen python=3.9.5
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 torchtext==0.8 -f https://download.pytorch.org/whl/torch_stable.html
pip install torchmetrics==0.7.3
pip install pytorch_lightning==1.3.3 einops ftfy regex transformers==4.11.3
pip install av==8.0.3
pip install fire soundfile librosa numba unidecode tqdm mpi4py tensorboardX
pip install pycocoevalcap # need to fix the bug mentioned here https://github.com/tylin/coco-caption/pull/35/files

Dataset

Please run the following command to download the dataset.

mkdir -p datasets/coinrun
cd datasets/coinrun
wget http://dl.noahmt.com/creativity/data/MUGEN_release/coinrun.zip
unzip coinrun.zip
cd ...

For more information, please refer here.

Model

Run the following command to download the pre-trained checkpoints.

wget https://dl.fbaipublicfiles.com/large_objects/MUGEN_release/checkpoints.zip
unzip checkpoints

Please refer here for video-audio-text retrieval details and here for video-audio-text generation details.

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@article{hayes2022mugen,
  title={MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration},
  author={Hayes, Thomas and Zhang, Songyang and Yin, Xi and Pang, Guan and Sheng, Sasha and Yang, Harry and Ge, Songwei and Hu, Qiyuan and Parikh, Devi},
  journal={arXiv preprint arXiv:2204.08058},
  year={2022}
}