Skip to content

liuxuannan/FAK-Owl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project page paper arXiv


Introduction:

FKA_Owl_logo

FKA-Owl pioneers leveraging rich world knowledge from large vision-language models (LVLMs) and enhancing them with forgery-specific knowledge, to tackle the domain shift issue in multimodal fake news detection. We propose two lightweight modules for forgery-specific knowledge augmentation: the cross-modal reasoning module and the visual-artifact localization module to extract semantic correlations and artifact traces, respectively.

FKA_Owl

The proposed FKA-Owl is built upon the off-the-shelf LVLM consisting of an image encoder and a Large Language Model (LLM). Given a manipulated image-text pair, the cross-modal reasoning module (a) first extracts cross-modal semantic embeddings and visual patch features. Then, these visual patch features are processed by the visual-artifact localization module (b) to encode precise artifact embeddings. Finally, the semantic and artifact embeddings are incorporated into the forgery-aware vision-language model (c) combined with image features and the human prompt for deep manipulation reasoning.

🔧 Dependencies and Installation

# create an environment
conda create -n FKA_Owl python==3.9.0
# activate the environment
conda activate FKA_Owl
# install pytorch using pip
# for example: for Linux with CUDA 11.7
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
# install other dependencies
pip install -r requirements.txt

⏬ Prepare Checkpoint

You can download the pre-trained ImageBind model using this link. After downloading, put the downloaded file (imagebind_huge.pth) in [./pretrained_ckpt/imagebind_ckpt/] directory.

To prepare the pre-trained Vicuna model, please follow the instructions provided [here].

We use the pre-trained parameters from PandaGPT to initialize our model. You can get the weights of PandaGPT trained with different strategies in the table below. In our experiments, we use the Vicuna-7B and openllmplayground/pandagpt_7b_max_len_1024 due to the limitation of computation resource. Please put the downloaded 7B delta weights file (pytorch_model.pt) in the ./pretrained_ckpt/pandagpt_ckpt/7b/.

⏬ Prepare Data

You can download DGM4 dataset from [this link]. After downloading, put the data in the [./data] directory.

The directory of [./data] should look like:

└── data
   └── DGM4
       ├── manipulation
       │   ├── infoswap
       │   |   ├── ...
       |   |   └── xxxxxx.jpg
       │   ├── simswap
       │   |   ├── ...
       |   |   └── xxxxxx.jpg
       │   ├── StyleCLIP
       │   |   ├── ...
       |   |   └── xxxxxx.jpg
       │   └── HFGI
       │       ├── ...
       |       └── xxxxxx.jpg
       ├── origin
       │   ├── gardian
       │   |   ├── ...
       |   |   ...
       |   |   └── xxxx
       │   |       ├── ...
       │   |       ...
       │   |       └── xxxxxx.jpg
       │   ├── usa_today
       │   |   ├── ...
       |   |   ...
       |   |   └── xxxx
       │   |       ├── ...
       │   |       ...
       │   |       └── xxxxxx.jpg
       │   ├── washington_post
       │   |   ├── ...
       |   |   ...
       |   |   └── xxxx
       │   |       ├── ...
       │   |       ...
       │   |       └── xxxxxx.jpg
       │   └── bbc
       │       ├── ...
       |       ...
       |       └── xxxx
       │           ├── ...
       │           ...
       │           └── xxxxxx.jpg
       └── metadata_split
           ├── bbc
           |    ├── train.json
           |    ├── test.json
           |    └── val.json
           ├── guardian
           |    ├── train.json
           |    ├── test.json
           |    └── val.json
           ├── usa_today
           |    ├── train.json
           |    ├── test.json
           |    └── val.json
           ├── washington_post
           |    ├── train.json
           |    ├── test.json
           |    └── val.json

💻 Training FKA-Owl

To train FAK-Owl on the bbc subset of DGM4 dataset, please run the following commands:

cd ./code
bash ./scripts/train_DGM4_bbc.sh

The key arguments of the training script are as follows:

  • --config_path: The data path for the config file train_bbc.yaml.
  • --imagebind_ckpt_path: The path of ImageBind checkpoint.
  • --vicuna_ckpt_path: The directory that saves the pre-trained Vicuna checkpoints.
  • --max_tgt_len: The maximum sequence length of training instances.
  • --save_path: The directory which saves the trained delta weights. This directory will be automatically created.
  • --log_path: The directory which saves the log. This directory will be automatically created.

Note that the epoch number can be set in the epochs argument at ./code/config/openllama_peft.yaml file and the learning rate can be set in ./code/dsconfig/openllama_peft_stage_1.json

💻 Testing FKA-Owl

To testing FAK-Owl on the washington_post subset of DGM4 dataset, please run the following commands:

cd ./code
bash test.sh

🤗 Acknowledgements

We borrow some codes and the pre-trained weights from PandaGPT. Thanks for their wonderful work!

Citation:

If you found FKA-Owl useful in your research or applications, please kindly cite using the following BibTeX:

@inproceedings{liu2024fka,
    title={FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs},
    author={Liu, Xuannan and Li, Peipei and Huang, Huaibo and Li, Zekun and Cui, Xing and Liang, Jiahao and Qin, Lixiong and Deng, Weihong and He, Zhaofeng},
    booktitle={ACM MM},
    year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published