Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Current LLMs use RLHF to reduce explicit bias in their outputs. But do they also address implicit bias?

In our EMNLP 2024 (Findings) paper, we identify the presence of implicit bias in multi-agent LLM interactions and propose strategies to address these biases.

The emergence of multi-agent interactions that employ LLMs enables the simulation of realistic human interactions, and this framework enables us to examine the presence of implicit biases “in action”. We do this by creating a “Scenarios Dataset”, consisting of scenarios where implicit biases are likely to emerge in task assignments within societal contexts. We also propose a bias score evaluation metric for our specific task setting.

We find that biases increase after multi-agent interaction. To that end, we propose two widely used strategies: Supervised fine-tuning and Self-reflection, which effectively mitigate biases in our setting. For more information, read our paper:

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

By Angana Borah and Rada Mihalcea

Lessons Learned

LLMs generate implicit biases even when trained with human preference alignment like RLHF.
Larger models are prone to produce more biased outputs.
Biases increase after multi-agent LLM interactions.
Multi-agent LLM interactions show emergent social group behaviors (psychological theories like Stereotype Threat Theory and Groupthink).

Data and Code

The Scenarios, Fine-tune and Test datasets are provided in the Data folder.

The codebase for the multi-agent framework is in the Code folder.

Citation

@misc{borah2024implicitbiasdetectionmitigation,
      title={Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions}, 
      author={Angana Borah and Rada Mihalcea},
      year={2024},
      eprint={2410.02584},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.02584}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Code		Code
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Lessons Learned

Data and Code

Citation

About

Releases

Packages

Languages

MichiganNLP/MultiAgent_ImplicitBias

Folders and files

Latest commit

History

Repository files navigation

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Lessons Learned

Data and Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages