In this tutorial you will find all the steps and instructions you need in order to reproduce the experiments performed in "Gender Imbalance in Medical Imaging Datasets Produces Biased Classifiers for Computer-aided Diagnosis" by Agostina Larrazabal, Nicolás Nieto, Victoria Peterson, Diego H. Milone, and Enzo Ferrante. Proceedings of the National Academy of Sciences May 2020.
https://www.pnas.org/content/early/2020/05/19/1919012117
This code is based on the following publicly available implementation of CheXNet using Keras: https://github.com/brucechou1983/CheXNet-Keras
Step 0: If it is your first time coding in Python 3, you will have to install it. We recommend to install Anaconda Distribution:
You could find some straigthforward instructions in the following tutorial (up to Step 8):
https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart
We use conda 4.7.12
In this repository you will find all the scripts needed to repoduce our experiments.
-
Open a Terminal
-
Set the terminal path in the unzip GenderBias_CheXNet
(base)>> python batch_download_zips.py
This may take a while.
If you rather prefer to download the data by your own, you could find all the files here:
https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/37178474737
1- Open a Terminal in the repository's path.
2- Run the following command:
(base)>>conda env create --name your_env_name --file requirements.txt
Some packages could not be install by conda so we have to install theme with pip inside your environmiroment.
(base)>>source activate your_env_name
(your_env_name)>> pip install pillow==4.2.0
(your_env_name)>> pip install opencv-python==4.1.0.25
(your_env_name)>> pip install imgaug==0.2.9
Check your system cuda version
(your_env_name)>> nvcc --version
Update your env cuda version
(your_env_name)>> conda install cudatoolkit==your_cuda_version
(base)>>source activate your_env_name
You will see your environment name in the command line
(your_env_name)>>
First, make sure that in "config_file.ini" the image_source_dir contains the path where you have download the dataset.
Run the training script with the following command:
(yout_env_name)>> python3 training.py
When the training process finished, you will find the "/output" folder that contains the trained weights of the network.
Now that you have your model trained, it is time to generate predictions in unseen data
Run the testing script with the following command:
(your_env_name)>>python3 testing.py
When the testing is over, you will find the network predictions in the "/output" folder.
As an example, for the fold 0, training with only male images and testing on female set you will find:
y_pred_run_0_train_0%_female_images_test_female.csv
In this section we include results for our analysis using three different CNN architectures and two datasets of X-ray images.
Experimental results for a DenseNet classifier trained with images from the NIH dataset and the CheXpert dataset. The boxplots aggregate the results for 20 folds, training with male (blue) and female (orange) patients. Both models are evaluated given male-only and female-only test folds. A consistent decrease in terms of area under the receiver operating characteristic curve (AUC) is observed when using male patients for training and female for testing (and viceversa). Statistical significance according to Mann–Whitney U test is denoted by **** (p ≤ 0.00001), *** ( 0.00001 < p ≤ 0.0001), ** ( 0.0001 < p ≤ 0.001), * ( 0.001 < p ≤ 0.01) and ns (p > 0.01).}
Experimental results for a ResNet classifier trained with images from the NIH dataset and the CheXpert dataset. The boxplots aggregate the results for 20 folds, training with male (blue) and female (orange) patients. Both models are evaluated given male-only and female-only test folds. A consistent decrease in terms of area under the receiver operating characteristic curve (AUC) is observed when using male patients for training and female for testing (and viceversa). Statistical significance according to Mann–Whitney U test is denoted by **** (p ≤ 0.00001), *** ( 0.00001 < p ≤ 0.0001), ** ( 0.0001 < p ≤ 0.001), * ( 0.001 < p ≤ 0.01) and ns (p > 0.01).}
Experimental results for a InceptionV3 classifier trained with images from the NIH dataset and the CheXpert dataset. The boxplots aggregate the results for 20 folds, training with male (blue) and female (orange) patients. Both models are evaluated given male-only and female-only test folds. A consistent decrease in terms of area under the receiver operating characteristic curve (AUC) is observed when using male patients for training and female for testing (and viceversa). Statistical significance according to Mann–Whitney U test is denoted by **** (p ≤ 0.00001), *** ( 0.00001 < p ≤ 0.0001), ** ( 0.0001 < p ≤ 0.001), * ( 0.001 < p ≤ 0.01) and ns (p > 0.01).}