Skip to content

Data and code for our analysis of DermaMNIST (MedMNIST), HAM10000, and Fitzpatrick17k datasets

License

Notifications You must be signed in to change notification settings

kakumarabhishek/Corrected-Skin-Image-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License DOI Code style: black

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets

This repository contains the code accompanying our paper titled "Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets".

Repository Structure

The repository is structured as:

  • DermaMNIST/: Parent directory for DermaMNIST analysis and benchmarking experiments.
    • DermaMNIST_Analysis/: Directory containing the code for all our analysis of the DermaMNIST dataset, including the preparation of DermaMNIST-C and DermaMNIST-E datasets.
    • DermaMNIST_Training/: Directory containing the code for reproducing our benchmark experiment results, including the new medmnist_corrected.
    • HAM10000_DuplicateConfirmation/: Directory containing the code and results for detecting new duplicate image pairs in the HAM10000 dataset.
  • Fitzpatrick17k/: Parent directory for Fitzpatrick17k analysis and benchmarking experiments.
    • Fitzpatrick17k_Analysis/: Directory containing the code for our cleaning pipeline for the Fitzpatrick17k dataset, including the preparation of the Fitzpatrick17k-C dataset.
    • Fitzpatrick17k_Training/: Directory containing the code for reproducing our benchmark experiment results.

Metadata Files

The metadata files for the datasets released with this work are listed below:

These files are also available on this project's Zenodo repository.

Online Resource

Additional visualizations and links to the new datasets: DermaMNIST-C, DermaMNIST-E, and Fitzpatrick17k-C are available on the project website.

Zenodo Repository

DOI

The datasets released with this work: DermaMNIST-C, DermaMNIST-E, and Fitzpatrick17k-C are available on Zenodo.

License and Citation

The code in this repository is licensed under the Apache License 2.0.

If you use our newly proposed datasets or our analyses, please cite our paper and our Zenodo repository. The corresponding BibTeX entries are:

@article{abhishek2024investigating,
  title = {Investigating the Quality of {DermaMNIST} and {Fitzpatrick17k} Dermatological Image Datasets},
  author = {Abhishek, Kumar and Jain, Aditi and Hamarneh, Ghassan},
  journal = {arXiv preprint arXiv:2401.14497},
  doi = {10.48550/ARXIV.2401.14497},
  url = {https://arxiv.org/abs/2401.14497},
  year = {2024}
}

@dataset{abhishek_2024_11101337,
  title = {{Investigating the Quality of {DermaMNIST} and {Fitzpatrick17k} Dermatological Image Datasets}},
  month = May,
  year = 2024,
  author = {Abhishek, Kumar and Jain, Aditi and Hamarneh, Ghassan},
  language = {en},
  publisher = {Zenodo},
  doi = {10.5281/ZENODO.11101337},
  url = {https://zenodo.org/doi/10.5281/zenodo.11101337},
}

Dataset Acknowledgements

We would like to thank the authors of the original papers: DermaMNIST (ISBI 2021, Nat Sci Data 2023), HAM10000 (Nat Sci Data 2018), and Fitzpatrick17k (CVPR ISIC 2021), for making their datasets publicly available. We would request the users of our datasets to also cite the original datasets in their work. The corresponding BibTeX entries for the original datasets are:

DermaMNIST:
@inproceedings{yang2021medmnist,
  title = {{MedMNIST} Classification Decathlon: A Lightweight {AutoML} Benchmark for Medical Image Analysis},
  author = {Yang, Jiancheng and Shi, Rui and Ni, Bingbing},
  booktitle = {2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)},
  pages = {191--195},
  year = {2021},
  organization = {IEEE},
  doi = {10.1109/ISBI48211.2021.9434062}
}

@article{yang2023medmnist,
  title = {{MedMNIST} v2 - A large-scale lightweight benchmark for {2D} and {3D} biomedical image classification},
  author = {Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing},
  journal = {Scientific Data},
  volume = {10},
  number = {1},
  pages = {41},
  year = {2023},
  publisher = {Nature Publishing Group UK London},
  doi = {10.1038/s41597-022-01721-8}
}
HAM10000:
@article{tschandl2018ham10000,
  title = {The {HAM10000} dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions},
  author = {Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald},
  journal = {Scientific Data},
  volume = {5},
  number = {1},
  pages = {1--9},
  year = {2018},
  publisher = {Nature Publishing Group},
  doi = {10.1038/sdata.2018.161}
}
Fitzpatrick17k:
@inproceedings{groh2021evaluating,
  title = {Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the {Fitzpatrick 17k} Dataset},
  author = {Groh, Matthew and Harris, Caleb and Soenksen, Luis and Lau, Felix and Han, Rachel and Kim, Aerin and Koochek, Arash and Badri, Omar},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages = {1820--1828},
  year = {2021}.
  doi = {10.1109/CVPRW53098.2021.00201}
}