Hate-LLaMA : An Instruction-tuned Audio-Visual Language Model for Hate Content Detection

Hate speech detection in online videos is an important but challenging problem, especially with the rise of video-sharing platforms. Existing solutions rely primarily on unimodal models focused on text or image inputs, with less emphasis on multimodal models that analyze both visual and audio aspects of videos. We present Hate-LLama, an instruction-tuned audio-visual language model fine-tuned on a labeled hate speech video dataset named HateMM. Hate-LLaMA is a finetuned version of Video-LLaMA. It accepts video input and makes hate speech classifications by analyzing both visual frames and audio in a multimodal fashion. Hate-LLaMA efficiently detects hate content with an accuracy of 71%.

Another major challenge of hate speech detection on videos is the scarcity of labeled video datasets, hence we also propose a benchmark dataset of around 300 videos consisting of 33% hate and 67% non-hate content.

Examples

Prerequisites

Environment Setup

conda env create -f environment.yml
conda activate hatellama
pip install -r requiremens.txt

Checkpoints and dataset

download and move the checkpoints to /ckpt folder.

Download meta-llama/Llama-2-7b-chat-hf from huggingface.
Download checkpoints for finetuned audio and video branch Hate-LLaMA and imagebind encoder from here
To download our curated benchmark, click here
For the HateMM dataset, please refer

DEMO

To run execute the demo

pip install -r requirements-demo.txt

python3 app.py

Executing the demo requires one gpu (preferably RTX8000/A100).

Finetuning

Adapting the dataset to instruction-tuning format, use convert-data.py python script.

For pretrained video-llama checkpoints, please refer.

To finetune the audio and video branches using these pretrained checkpoints -

configure the checkpoints and hyperparameters inside the audiobranch_stage2_finetune.yaml and visionbranch_stage2_finetune.yaml

conda activate hatellama

# Finetune the Vision-language branch
torchrun --nproc_per_node=4 train.py --cfg-path  ./train_configs/visionbranch_stage2_finetune.yaml

# Finetune the Audio-language branch
torchrun --nproc_per_node=4 train.py --cfg-path  ./train_configs/audiobranch_stage2_finetune.yaml

The finetuning process was done over 4 RTX8000 GPUs.

Inference

To evaluate the models performance against the test sets :

python inference.py --gpu-id=0 --cfg-path="eval_configs/video_llama_eval_withaudio_stage3.yaml” --ckpt_root="output/"

to compute accuracy and F-1 score:

unzip Results.npz
python compute_metrics.py whole_results.npy

Additional Information

We also provide the code to crawl bitchute platform and curate the benchmark in /benchmark.
The updated code to run the HateMM baseline for our benchmark is provided in /baseline.

Acknowledgements

We are grateful for the following open-source repos that helped us build our project

Contributors

Anisha Bhatnagar ([email protected])
Divyanshi Parashar ([email protected])
Simran Makariye ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Baseline		Baseline
Benchmark		Benchmark
ckpt		ckpt
docs		docs
eval_configs		eval_configs
prompts		prompts
train_configs		train_configs
video_llama		video_llama
Instruction_Builder.ipynb		Instruction_Builder.ipynb
LICENSE		LICENSE
LICENSE_Lavis.md		LICENSE_Lavis.md
LICENSE_Minigpt4.md		LICENSE_Minigpt4.md
README.md		README.md
app.py		app.py
compute_metrics.py		compute_metrics.py
convert_data.py		convert_data.py
environment.yml		environment.yml
inference.py		inference.py
requirements-demo.txt		requirements-demo.txt
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Hate-LLaMA : An Instruction-tuned Audio-Visual Language Model for Hate Content Detection

Examples

Prerequisites

Environment Setup

Checkpoints and dataset

DEMO

Finetuning

Inference

Additional Information

Acknowledgements

Contributors

About

Licenses found

Releases

Packages

Contributors 3

Languages

License

Licenses found

anishabhatnagar/Hate-LLaMA

Folders and files

Latest commit

History

Repository files navigation

Hate-LLaMA : An Instruction-tuned Audio-Visual Language Model for Hate Content Detection

Examples

Prerequisites

Environment Setup

Checkpoints and dataset

DEMO

Finetuning

Inference

Additional Information

Acknowledgements

Contributors

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages