This is the source code of FacebookAI HatefulMemes challenge first place solution. In this comeptetion, we using multiple type of annotation extracted from hateful-memes dataset and feed those data into multi-modal transformers to achieve high accuracy. You can read about the detail of our approch in:
- Paper Enhance Multimodal Transformer With External Label And In-Domain Pretrain
- Sild of NeurIPS 2020 competition track event
- Docker >= 19.0.3
- nvidia-container-toolkit.
NOTE: Make sure you follow this guide to let docker run as root, so it can be run by shell scripts with out sudo
.
Original experiement was conduct on GCP n1-highmem-16
instance init with TensorFlow2.3/Keras.CUDA11.0.GPU
GCE Image:
- OS: Ubuntu 18.04.5 LTS
- CPU: 16 Core Intel CPU
- Memory: 104 GB
- GPU: 4 Nvidia T4
- Disk: 500GB HDD
Most of the data preprocessing and model training could be done with only 1 T4 GPU, except VL-BERT need 4 GPU to achieve high enough batch size when fine-tuning Faster-RCNN & BERT togather.
NOTE: All models used in this project is using fp16 acceleration during training. Please use GPU support NVDIA AMP.
-
Data preprocess and extract additional features. See detailed instruction at data_utils/README.
-
Train modified VL-BERT(2 large one and 1 base one). See detailed instruction at VL-BERT/README.
-
Train UNITER-ITM(1 large one and 1 base one) and VILLA-ITM(1 large one and 1 base one). See detailed instruction at UNITER/README.
-
Train ERNIE-Vil(1 large one and 1 base one). See detailed instruction at ERNIE-VIL/README.
-
Ensemble by average predictions of all model then apply simple rule-base racism detector on top of it.
bash run_ensemble.sh
This script will let you select the predition of different model to taken into ensemble. As result it will output
ROOT/test_set_ensemble.csv
as final result and copy all the csv files used in ensemble toROOT/test_set_csvs
folder.