Skip to content

computationalprivacy/copyright-traps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Copyright Traps for Large Language Models

This is the accompanying code for the paper Copyright Traps for Large Language Models published at ICML 2024.

The dataset of traps has been released here!

Generating trap sequences

We generate trap sequences to be injected with the following script:

python src/scripts/gen_traps.py --path-to-model $LLAMA_MODEL --path-to-tokenizer $LLAMA_TOKENIZER -o data/traps.pkl --seq-len 25 -n 500

This example generates 500 sequences of 25 tokens (26 including BOS tokens), with perplexity uniformly distributed between [1,101). This means that the output file will contain 5 sequences with k <= perplexity < k+1 for any k between 1 and 100. Note that in the paper we split sequences into perplexity buckets of size 10 (i.e. [1,11), [11,21), etc).

Important arguments:

  • --min-perplexity and --max-perplexity define perplexity range
  • --num-buckets define number of buckets in the output file
  • --temp-min, --temp-max, --temp-step configure temperature settings of the LLM when generating sequences. We iterate over a range of temperature values to cover the desired perplexity range (default temperature would rarely produce sequences with very low or very high perplexity)
  • --jaccard-threshold ensures deduplication between generated sequences. We want to ensure there's no cross-memorization between different trap-sequences, so we ensure jaccard distance between any two sequences is above a certain threshold. This is increasingly important for low perplexity sequences.
  • --retokenize eliminates tokenization artifacts and ensures that generated sequences maintain target length after once cycle of decoding to raw text, and then encoding back. When enabled, we only consider sequences which maintain the same length after retokenization.

Injecting trap sequences

We inject trap sequences generated at the previous step by running the following:

python "src/scripts/inject_traps.py" --path-to-tokenizer "$LLAMA_TOKENIZER_PATH" --path-to-raw-dataset "$INPUT_DATASET_PATH" --path-to-trap-dir "data/traps/" --output-ds-path "data/injected/dataset_with_traps" --output-info-path "data/injected/trap_info.pkl" --n-reps 1 10 100 1000 --seed 1111

Input dataset should be a huggingface dataset that can be loaded with load_from_disk() method. Dataset should contain at least one document per trap sequence provided - we inject one sequence to one document (while repeating the requested number of times).

Folder specified in --path-to-trap-dir is expected to only contain the output of the previous step (potentially run multiple times with different parameters) - this script will iterate over and read all files in that folder.

This generates two outputs: the dataset itself and the metadata. The dataset contains the original data plus injected sequences. Metadata is the dataframe (pickled, because it makes it easier to deal with lists) containing all the injection metadata: which trap was injected into which document and how many times.

Important arguments:

  • --n-reps: a list of integers defining number of repetitions when injecting a sequence. Number of elements in this list must be so that the total number of traps is divisible by it - we want each number of repetitions to have the same number of traps.
  • --doc-min-tokens: we would only inject into documents with at least this number of tokens

Please refer to src/scripts/run_all.sh for the full pipeline.

Membership inference

Note that at the moment we're not yet releasing the exact trap sequences we've used for training, so data analysis code is provided for illustrative purposes only.

Code to generate out figures and tables is located in notebooks/ folder, with one exceptions - script for Table 3 which lives is src/scripts.

The key step in our analysis is performing a Ratio Membership Inference Attack (MIA) for trap sequences. For each sequence it computes a ratio of the target model (Croissant) perplexity divided by the reference model (LLaMA) perplexity. The intuition here is that we want to measure the change in perplexity compared to the model that hasn't seen the sequence, but without retraining the full Croissant model. See utils.py for more details on MIA implementation.

We evaluate our MIA on a balanced dataset of members and non-members. We therefore run gen_traps.py script one more time with the same hyperparameters and for the same number of trap sequences, but do not inject them into the dataset. Data analysis notebook expect non-member traps to be located in the same folder, with one file per sequence length, where the sequence length is inducated in the filename. We then expect the filename template to be provided as NON_MEMBERS_PATH_TEMPLATE.

For instance, if your non-member sequences are located as /data/traps/non_members_len_25, /data/traps/non_members_len_50, and /data/traps/non_members_len_100, you should set

NON_MEMBERS_PATH_TEMPLATE="/data/traps/non_members_len_%d"

Citation

Please cite this work as

@inproceedings{meeuscopyright,
  title={Copyright Traps for Large Language Models},
  author={Meeus, Matthieu and Shilov, Igor and Faysse, Manuel and de Montjoye, Yves-Alexandre},
  booktitle={Forty-first International Conference on Machine Learning}
}