This repository hosts code and datasets relating to Responsible NLP projects from Meta AI.
AdvPromptSet
- AdvPromptSet: a comprehensive and challenging adversarial text prompt set with 197,628 prompts of varying toxicity levels and more than 24 sensitive demographic identity groups and combinations.
fairscore
:- From Rebecca Qian, Candace Ross, Jude Fernandes, Eric Smith, Douwe Kiela, Adina Williams. Perturbation Augmentation for Fairer NLP. 2022.
- PANDA, an annotated dataset of 100K demographic perturbations of diverse text, rewritten to change gender, race/ethnicity and age references.
- The perturber, pretrained models, code and other artifacts related to the Perturbation Augmentation for Fairer NLP project will be released shortly.
gender_gap_pipeline
:holistic_bias
:- From Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, Adina Williams. "I'm sorry to hear that": finding bias in language models with a holistic descriptor dataset. 2022.
- Code to generate a dataset, HolisticBias, consisting of nearly 600 demographic terms in over 450k sentence prompts
- Code to calculate Likelihood Bias, a metric of the amount of bias in a language model, defined on HolisticBias demographic terms
robbie
:- ROBBIE: we test 6 bias/toxicity metrics (including 2 novel ones) across 5 model families and 3 bias/toxicity mitigation techniques, and show that using a broad array of metrics enables much better assessment of safety issues in these models and mitigations.
- SMART-Filtering
- from Vipul Gupta, Candace Ross, David Pantoja, Rebecca J. Passonneau, Megan Ung, Adina Williams. Improving Model Evaluation using SMART Filtering of Benchmark Datasets. 2024.
- SMART Filtering: a new approach to select high quality subset of examples from existing benchmark datasets. The methodology applies three filtering steps: 1) removing easy examples, 2) removing data-contaminated examples that are highly likely to have been leaked into the training datasets, and 3) removing similar examples.
See CONTRIBUTING.md for how to help out, and see LICENSE for license information.