Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Introduction

We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms under the Right to be Forgotten setting. Specifically, we formulate the VLM unlearning task via constructing the Fictitious Facial Identity VQA dataset and apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels. In terms of evaluation, since VLM supports various forms of ways to ask questions with the same semantic meaning, we also provide robust evaluation metrics including membership inference attacks and carefully designed adversarial privacy attacks to evaluate the performance of algorithms. Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance, with significant trade-offs between model utility and forget quality. Furthermore, our findings also highlight the importance of privacy attacks for robust evaluations. We hope FIUBench will drive progress in developing more effective VLM unlearning algorithms.

🔥 News

[TBD] We will add more unlearning strategies to our benchmark!
[2024.11.1] We release the paper and the data of our project.

Fictitious Datasets

You can download our fictitious dataset in this link. Our fictitious includes 400 virtual face images from SFHQ dataset, each corresponding to a fictitious person.

Unlearning Pipeline

Install

Clone this repository and navigate to VLM_Unlearned folder

git clone https://github.com/gray311/VLM_Unlearned.git
cd VLM_Unlearned

Install Package

conda create -n unlearned python=3.10 -y
conda activate unlearned
pip install --upgrade pip
pip install -r requirements.txt

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Data Preparation

Download fictitious dataset:

mkdir dataset
cd dataset
git clone https://huggingface.co/datasets/gray311/FIUBench/
cd FIUBench && mv * ./../

Learning

Finetune VLMs on fictitious datasets so that they learn fictitious entity-related knowledge

bash scripts/finetune.bash

# you can modify config/accelerate.yaml and finetune.yaml according to your expected settings.

You can use the file evaluate_util.py and modify the configuration in config/eval.yaml.

bash scripts/eval_everything.bash

Unlearning

Finetune unlearned models on forget set (i.e., dataset/overall/forget10.json) so that they forget fictitious entity-related knowledge.

bash scripts/forget_lora.bash

# you can modify config/accelerate.yaml and finetune.yaml according to your expected settings.

Compute metrics. You can use the file evaluate_util.py and modify the configuration in config/eval.yaml. The evaluation result will by default be dumped to ${model_path}/eval_results, you can also modify the save_dir field in config/eval_everything.yaml.

bash scripts/eval_everything.bash

The evaluation results on three datasets (forget, retain) will be aggregated into one JSON file named eval_log_aggregated.json. Finally, you can run

bash scripts/aggregate.bash

to obtain an aggregated csv format result that contains the Rouge-L, Truth Ratio, Probability, KS-Test scores, Exact Match, GPT score, APE, and MIA.

python results_collect.py # this step aims to collect all results file ```eval_log_aggregated.json``` of all unlearned checkpoints.

Compute ACC metric on MME and POPE.

cd eval
python eval_mme.py # Please note that you need to modify scripts at the end of this file.
python eval_pope.py # Please note that you need to modify scripts at the end of this file.

Acknowledgement

We are highly inspired by: TOFU

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
config		config
data_generation		data_generation
dataset		dataset
eval		eval
scripts		scripts
README.md		README.md
aggregate_eval_stat.py		aggregate_eval_stat.py
build_dataset.py		build_dataset.py
data_loader.py		data_loader.py
data_module.py		data_module.py
evaluate_util.py		evaluate_util.py
finetune.py		finetune.py
forget.py		forget.py
gpt_eval.py		gpt_eval.py
inference.py		inference.py
overview.png		overview.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
results_collect.py		results_collect.py
utils.py		utils.py
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Introduction

🔥 News

Fictitious Datasets

Unlearning Pipeline

Install

Data Preparation

Learning

Unlearning

Acknowledgement

About

Releases

Packages

Languages

SaFoLab-WISC/FIUBench

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Introduction

🔥 News

Fictitious Datasets

Unlearning Pipeline

Install

Data Preparation

Learning

Unlearning

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages