This repo is created for ESEC/FSE 2020 paper - Fairway: A Way to Build Fair ML Software
Folder Structure Details -
- Split_On_Protected_Attribute folder contains code for the pre-processing step
- Multiobjective Optimization folder contains code for FAIR_FLASH
- dataset folder contains all the datasets we used and Unbiased_dataset contains the dataset after pre-processing
- Measure.py contains all the function for performance measure
- For details description of previous algorithms please visit IBM AIF360
- For reproducing results of the paper please run RQ1.sh and RQ6.sh
1> Adult Income dataset - http://archive.ics.uci.edu/ml/datasets/Adult
2> Bank Marketing - https://archive.ics.uci.edu/ml/datasets/bank+marketing
3> German Credit - https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
4> Default - https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
5> ProPublica Recidivism/COMPAS - https://github.com/propublica/compas-analysis
6> Medical - https://meps.ahrq.gov/mepsweb/ ( 2 csv files )
7> Arrhythmia - https://archive.ics.uci.edu/ml/datasets/Arrhythmia
8> Cleveland Heart Disease - https://archive.ics.uci.edu/ml/datasets/heart+Disease
9> Communities and Crime Data Set (Regression) - http://archive.ics.uci.edu/ml/datasets/communities+and+crime
Some datasets are not classification dataset -
1> Execution - https://data.world/markmarkoh/executions-since-1977
2> NYPD (New York City Police Department) - https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page
3> Communities and Crime Data Set - https://archive.ics.uci.edu/ml/datasets/communities+and+crime
4> Health of Populations - https://gitlab.com/labsysmed/dissecting-bias/-/tree/master
5> UFRGS Entrance Exam and GPA Data - https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/O35FW8