Skip to content

Latest commit

 

History

History
225 lines (159 loc) · 9.22 KB

README.md

File metadata and controls

225 lines (159 loc) · 9.22 KB

Understanding CTX AUG

This repository contains the code for the ACL Findings paper Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models (Kew & Sennrich, 2023).

Our experiments reimplement some of the zero-shot control methods described in the papers by Zero-Shot Controlled Generation with Encoder-Decoder Transformers (Hazarika et al., 2021) and Attention Biasing and Context Augmentation for Zero-Shot Control of Encoder-Decoder Transformers for Natural Language Generation (Hazarika et al., 2022).

Setup

We recommend using a clean conda environment to run these scripts.

To set up the working environment, run the following commands.

# if running on cluster, load the relevant modules, e.g.
module load anaconda3/2022.10 gpu gcc/8.5.0 cudnn/10.2.89

# create new clean environment
conda create -n unsup_ctrl python=3.8 -y
conda activate unsup_ctrl && echo "CONDA ENV: $CONDA_DEFAULT_ENV"

pip install -r requirements.txt

# depending on cuda driver, may need to install from whl, e.g.
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

# for finetuning data preprocessing
python -m spacy download en_core_web_sm

# to run notebook from a server with ipython kernels, run
python -m ipykernel install --user --name=unsup_ctrl

Resources

To set up the location of larger files such as data and models:

mkdir resourses # or ln -s /path/to/storage/ resources
mkdir resources/data
mkdir resources/models
# pretraining resources
ln -s resources pretraining/resources
# We also need a directory for the experiments results:
mkdir results

Data

Experiments in the original paper mostly use the Topical Chat dataset (Gopalakrishnan et al., 2019), which can be found here.

To download the data for fine-tuning, run:

git clone https://github.com/alexa/Topical-Chat.git data/Topical-Chat
cd data/Topical-Chat/src
pip3 install -r requirements.txt

# NOTE: Building the data requires Reddit credentials. 
# Please create your own Reddit API keys: https://www.reddit.com

# NOTE: To collect the reading sets, the IDs pointing to one data point has changed (https://github.com/alexa/Topical-Chat/issues/11),
# so you need to change the ID "t3_2au72q" to "t3_r8dxya" in the following files:
# reading_sets/pre-build/test_freq.json, reading_sets/pre-build/train.json, reading_sets/pre-build/valid_freq.json

python3 build.py  --reddit_client_id CLIENT_ID --reddit_client_secret CLIENT_SECRET --reddit_user_agent USER_AGENT

This build takes around 1 hour. Once completed, we can prepare the data for training according to the description provided in Hazarika et al., (2021) with the following:

sbatch jobs/run_data_prep-TopicalChat.sh

Experiments

Experiments were run on a slurm cluster.

To run a controlled experiment with mini BART models use jobs/run_mini_bart.sh, specifying the random seed and the yml config with BART's denoising args. This performs pre-training, fine-tuning, inference and evaluation.

bash jobs/run_mini_bart.sh -s 42 -c exp_configs/SI_bart.yml

To fine-tune, generate and evaluate a publicly available pre-trained model on slurm, use:

bash jobs/run_public.sh -s 23 -m "facebook/bart-base" -d "resources/data/Topical-Chat/KGD"
bash jobs/run_public.sh -s 23 -m "google/t5-small-lm-adapt" -d "resources/data/Topical-Chat/KGD"
bash jobs/run_public.sh -s 23 -m "t5-small" -d "resources/data/Topical-Chat/KGD"

Individual Steps

Pre-training small BART models

See this README.

Fine-tuning base models

The python script ./finetune.py is adapted from Hugging Face's run_summarization.py example script and can be used to fine-tune a new model for our experiments.

The bash wrapper script ./finetune.sh provides the training commands used to train our models.

To fine-tune a model on a slurm cluster use jobs/run_finetuning.sh, e.g.:

seed=23
sbatch jobs/run_finetuning.sh \
    -i resources/models/seed_$seed/pt/hf_conv/bart_small-MLM/ \
    -o resources/models/seed_$seed/CD/ft/$model_name/ \
    -s $seed \
    -d resources/data/Topical-Chat/KGD

Inference

To perform inference on a slurm cluster, run:

sbatch jobs/run_generation_exp.sh \
    -m resources/models/ft/bart_base \
    -t resources/data/Topical-Chat/KGD/test_freq.json

For multiple experimental inference runs with BART-mini, it's also possible to parallelise jobs on a single GPU, e.g.

sbatch jobs/run_generation_exp_parallel.sh \
    -m resources/models/ft/bart_small-MLM \
    -t resources/data/Topical-Chat/KGD/test_freq.json

Note: you can modify the experiment IDs in these scripts to match your needs!

Inference with ctx. aug. / attn. biasing

The script constants.py contains a series of hardcoded experimental configs. To run a new experiment (i.e. all seeded generation runs), you can define a new experiment config in this script, e.g.:

"short_qu_ctxt_aug5": {
    "context_augmentation_examples": "resources/data/Topical-Chat/KGD/contexts/short_questions.txt",
    "context_code_attention_bias_value": 5,
    "max_context_examples": 5,
},

Note: to avoid errors with post-hoc evaluation (not always used), you should also add the name of the experiment and the relevant filepath ending in eval.py.

Analysing Results

To double check which experiments have been completed and have results, use check_experiment_results.py, specifying the dataset ID (TC/CD/DD) and the testset's directory stem, e.g.:

python check_experiment_results.py TC test_freq-bart_small

The results and plots from the paper were generated summarize_results.ipynb (Note, this notebook hasn't been cleaned!):

Known Limitations

  • The bias profile used is fixed across all decoding timesteps (not gradual)
  • Commands for generating all the different types of context example files are missing from this documentation.

Citation

@inproceedings{kew-sennrich-2023-uncovering,
    title = "Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models",
    author = "Kew, Tannon  and
      Sennrich, Rico",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.438",
    doi = "10.18653/v1/2023.findings-acl.438",
    pages = "7010--7022",
    abstract = "Some variants of self-supervised denoising objectives for pre-training encoder-decoder language models have been reported to have a negligible impact on downstream performance. Yet the design of these pre-training objectives leads to behavioural differences that can be uncovered with specific manipulations. We reproduce a recently proposed zero-shot control method and find that it is only successful on a subset of models. To understand what causes the difference in its effectiveness, we perform a set of controlled experiments, varying only the pre-training objective, and find unexpected interactions between the pre-training method and downstream controllability of models after fine-tuning. Our results show that different pre-training objectives have consequences that may not be visible in standard downstream evaluation, but which should be taken into account when developing models with controllability in mind.",
}