Code for EMNLP Findings 2020 paper: Beyond Language: Learning Commonsense from Images for Reasoning
- Setup a conda environment and install some prerequisite packages
conda create -n vibert python=3.6 # Create a virtual environment
source activate vibert # Activate virtual environment
pip install -r requirements.txt
git clone https://github.com/cocodataset/cocoapi.git # Install pycocotools
cd cocoapi/PythonAPI
python setup.py build_ext install
python -m nltk.downloader all # Install NLTK data
- Download the COCO datasets to train layout generator
cd layout_generator
./experiments/scripts/fetch_data.sh
cd ..
- train layout generator
cd layout_generator
python -u ./tools/train.py
--cuda --batch_size=8 --data_dir=./data/coco --log_dir=./coco
--exp_name=coco --attn_emb=True --output_cls_size=83
--finetune_lr=5e-5 --max_input_length=64 --max_output_length=20
--num_workers=1 --accumulation_steps=4 --n_epochs=15 --warmup=0.05
the checkpoint of ViBERT will be saved at $log_dir
- Download the Commonsense & WinoGrande datasets
cd reasoning
./download_data.sh
cd ..
- do commonsense reasoning on CommonsenseQA with BERT+ViBERT
cd reasoning
CUDA_VISIBLE_DEVICES=0,1 python -u train.py --cuda --num_workers=1 --parallel\
--epochs=5 --model_name=Loire --do_train\
--task=commonsenseqa\
--data_dir=$PATH_TO_DATA\
--model=bert-base --max_seq_length=64 --seed=1\
--batch_size=16 --accumulation_steps=1 --log_per_steps=500\
--warmup=0.1 --lr=1e-5 --adam_betas='(0.9, 0.98)' --adam_eps=1e-06\
--dropout=0.3 --weight_decay=0.01\
--feature --norm --proj=f2t prefix\
--pretrained_bert=$PATH_TO_VIBERT\
--save_ckpt
- do commonsense reasoning on WinoGrande with BERT+ViBERT
cd reasoning
CUDA_VISIBLE_DEVICES=0,1 python -u train.py --cuda --num_workers=1 --parallel\
--epochs=10 --model_name=Loire --do_train\
--task=winogrande --train_size=xl\
--data_dir=$PATH_TO_DATA\
--model=bert-base --max_seq_length=64 --seed=1\
--batch_size=16 --accumulation_steps=1 --log_per_steps=500\
--warmup=0.1 --lr=1e-5 --adam_betas='(0.9, 0.98)' --adam_eps=1e-06\
--dropout=0.3 --weight_decay=0.01\
--feature --norm --proj=f2t\
--pretrained_bert=$PATH_TO_VIBERT\
--save_ckpt
For training on different sizes of training set, set --size={xs,s,m,l,xl}'
accordingly.
For commonsense reasoning with RoBERTa+ViBERT, set --model=roberta-base
or --model=roberta-large
, and change other parameters, such as --max_seq_length
, --batch_size
, accordingly.
The model will be trained using single GPU by default. To use multiple GPUs please set the --parallel
flag and modify the batch size using the --batch_size
flag accordingly.
The model will add a LayerNorm after Linear layer by default. To not add a LayerNorm, please remove the --norm
flag.
Checkpoint files can be found in ./logs/commonsenseqa
or ./logs/winogrande
- test on CommonsenseQA with BERT+ViBERT
python -u train.py
--cuda --model_name=vibert --do_pred --feature --norm
--max_seq_length=64 --model=bert --task=commonsense_qa
--data_dir=./data/commonsenseQA
--pretrained_bert=$YOUR_PATH_SAVED_VIBERT
--pred_name=$OUTPUT_FILENAME
--pretrained=$THE_CHECKPOINT_TO_BE_EVALUATED
For test on WinoGrande, set --task=winogrande
and --data_dir=./data/WinoGrande
.
For test RoBERTa+ViBERT, set --model=roberta
.
Parameters --feature, --norm, --max_seq_length, --model, pretrained_bert
should be consistent with the parameters set during training
@article{cui2020beyond,
title={Beyond Language: Learning Commonsense from Images for Reasoning},
author={Cui, Wanqing and Lan, Yanyan and Pang, Liang and Guo, Jiafeng and Cheng, Xueqi},
journal={arXiv preprint arXiv:2010.05001},
year={2020}
}