This is a list of histopathology datasets made public for classification, segmentation, regression and/or registration tasks.
I am happy if you want to help me update and/or improve this document. I think it helps to have an overview of all the datasets available in the field.
I hope this list will help some of you.
Please find in the table below some link and information about histopathology dataset that are publicly available.
Dataset name | Organs | Staining | Link | Size | Data | Task | WSI/Patch | Other (Magnification, Scanner) | year |
---|---|---|---|---|---|---|---|---|---|
ACDC-LungHP [1a], [1b] | Lung | H&E | data, paper | Train: 150, Test: 50 | images + xml | seg + classi | wsi | 2019 | |
ACROBAT 2022 [66] | Breast | Multiple (IHC, H&E) | data, paper | Train: 750 train; Valid: 100; Test: 300 | images (1 H&E match to 1-4 IHC) + landmarks | registration | wsi | 40x - Hamamatsu | 2022 |
Adipocyte [94] | skin | H&E | data, paper | 200 patches | images+mask | cell detection | patch (120x150) | 40x | 2017 |
ADP [2] | multiple | multiple (most H&E) | data, github, paper | Train: 14.134, Valid: 1767, Test: 1767 (100 wsi) | images + 57 hierarchical HTTs (histological tissue type) | multi-label (3) classification (hierarchy) | patch (1088x1088) | 40x - Huron TissueScope LE1.2 WSI | 2019 |
AGGC | prostate | H&E | data, paper | Subset 1: train 105, test 45; Subset2: train 37 ,test 16; Subset3: train 144, test 67 | images + binary masks | seg + gleason grading | wsi | 20x - Subset1 and Subset2: Akoya Biosciences Scanner, Subset3: each specimen is scanned by multiple scanners | 2022 |
AML-Cytomorphology_LMU [67] | Blood | Wright's stain | data, paper | 18.365 images from 200 patients | classi | patch (cells) | 100x - M8 digital microscope/scanner | 2019 | |
ANHIR [3] | multiple (Lung, Kidney, Colon, Gastric, Breast) | multiple | data, paper | 50+ sets | image + landmarks | registration | patch (15k x 15k to 50k x 50k) | 40x, 20x, 10x, different scanner | 2019 |
ARCH [4] | multiple | multiple | data, paper | 4270 | images + caption | learn representation from text + image | patch | multiple | 2020 |
BACH - ICIA2018 [5] | Breast | H&E | data, paper | 400 | images (4 classes: normal 100, benign: 100, in situ carcinoma: 100, invasive carcinoma: 100) + 20 unlabeled + 10 labeled WSI (10 patients) | classi + seg | Patch (classi, 2048x1536) + WSI (seg) | Leica SCN400 | 2018 |
BCNB [6] | Breast | H&E | data, paper | 1058 (train 0.6, valid 0.2, test 0.2) | images + roi annotated + patient record | binary or multiple classi | wsi | 2021 | |
BCSS [7] | Breast | H&E | data, paper | 151 wsi, 20.000 patch | patch + segmentation mask | semantic seg | patch | (TCGA) | 2019 |
Bone-Marrow-Cytomorphology [68] | Marrow | May-Grünwald-Giemsa/Pappenheim | data, paper | 171.375 cells from 945 patients | images + label | classi (21) | patch (250x250 - single cell) | 40x | 2021 |
BRACS [62] | Breast | H&E | data, paper | 547 wsi, 4539 ROIs, 189 Patients | images + label (6 subtypes tumor + normal) | classi (7) | wsi + patch | 40x - Aperio AT2 | 2021 |
BreakHis [8] | Breast | H&E | data, paper | 7.909 (2480 benign, 5429 malignant) | images + binary label + tumor type (8) (multiple magnifications: 40x, 100x, 200x, 400x) | classi | Patch (700x460) | 40x, 100x, 200x, 400x | 2016 |
BRCA-M2C [95] | breast | H&E | data, paper | train: 80, valid: 10, test: 30 patches | images+point annotation | multi-class cell detection | patch (around 500x500) | 20x | 2021 |
BreCaHAD [9] | Breast | H&E | data paper | 162 | images + centroid with label | classi (6: mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, non-tubule) | patch (1360x1024) | 40x - Zeiss | 2019 |
CAMEL [63] | Colon | data, paper | 177 wsi (156 with adenoma) | image + label (binary) | classi | patch (1280x1280) | 2019 | ||
CAMELYON16 [10] | Lymph node | H&E | data, paper | Train: 270 (160 Normal, 110 with metastases); Test: 130 | images + binary masks | classi + seg | WSI | slide level analysis | 2016 |
CAMELYON17 [11] | Lymph node | H&E | data, paper | Train: 500 (100 patients, 5 slides each); Test: 500 | images + binary masks | classi + seg | WSI | patient level analysis | 2017 |
CAMELYON [12] | Breast (Lymph node) | H&E | paper | 1399 wsi | wsi | 2017 | |||
CATCH [88] | Skin (Canine) | H&E | data, paper | 350 wsi, 12.424 polygon annotations (13 classes) | images + contours (JSON) | seg + classi | wsi | 40x Aperio ScanScope CS2 (Leica) | 2022 |
Cellseg [13] | multiple | multiple | data, paper, github | images + limited labeled patches | instance (cell) segmentation | wsi | 2022 | ||
Chaoyang [57] | Colon | H&E | data, github, paper | Train: 111 normal, 842 serrated, 1404 adenocarcinoma, 664 adenoma, Test: 705 normal, 321 serrated, 840 adenocarcinoma, 273 adenoma samples | images + label | classi | patch (512×512) | 2021 | |
CoCaHis [61] | Colon | H&E | data, paper | 82 (19 patients) | images + mask from different annotator | seg | patch | 2021 | |
CoNIC 2022 [14] | Colon | H&E | data, github, paper | 4981 patch with 431.913 nuclei of 6 types | image + instance seg mask + classi mask | seg + classi + reg | patch (256x256) | 20x | 2022 |
CoNSeP - HoVer-Net [15] | Colorectal adenocarcinoma | H&E | data, paper | Train: 27 images, Test: 14 images, 24.319 nuclei | images + nuclei (location + class) | instance seg + classi (7: other, inflammatory, healthy epithelial, dysplastic/malignant epithelial, figroblast, muscle, endothelial) | patch (1000x1000) | 40x (UHCW) | 2019 |
CPM-15 [16] | brain | H&E | data | 15 (2905 nuclei) | images + nuclei seg + label | seg + classi | patch (400x400, 600x1000) | 20x, 40x (TCGA) | |
CPM-17 [17] | brain | H&E | data, paper | Train: 32, test: 32 (7570 nuclei) | images + nuclei seg + label | seg + classi | patch (500x500 to 600x600) | 20x, 40x (TCGA) | 2019 |
CPTAC-AML | Marrow, Blood | data | 120 images from 88 patients | 40x | 2020 | ||||
CPTAC-BRCA | Breast | data | 642 images from 134 patients | 40x | 2021 | ||||
CPTAC-COAD | Colon | data | 373 images from 106 patients | 40x | 2021 | ||||
CPTAC-OV | Ovary | data | 222 images from 102 patients | 40x | 2021 | ||||
CRAG - MILD-Net [18] | Colon | H&E | data, paper | Train: 173, Valid: 40 | image + segmentation | instance seg | patch (around 1500x1500) | 20x | 2019 |
CRCHisto [19] | Colon | H&E | data, paper | 100 images, 29.756 nuclei (10 wsi, 9 patients) | images + point nuclei class label | seg + classi (epithelial, inflammatory, fibroblast, miscellaneous) | patch (500x500) | 20x - Omnyx VL120 (UHCW) | 2016 |
CRC-TP [20] | CRC | H&E | data, paper | 280k patches (from 20 wsi) | images + tissue phenotypes | classi | patch | 2020 | |
CryoNuSeg [21] | multiple (10: adrenal gland, larynx, lymph nodes, mediastinum, pancreas, pleura, skin, testes, thymus, and thyroid gland) | H&E | data, github, paper | 8000 nuclei from 30 patches (from 30 wsi) | images + segmentation masks + binary labels | nuclei segmentation | patch (512x512) | 40x (from TCGA) | 2021 |
DHMC-Kidney [85] | Renal Cell Carcinoma | H&E | data, paper | 563 wsi | images + label | classi | wsi | 20x - Aperio AT2 | 2021 |
DHMC-Lung [86] | Lung Adenocarcinoma | H&E | data, paper | 143 wsi | images + label | classi | wsi | 20x or 40x - Aperio AT2 | 2019 |
DiagSeg [58] | Prostate | H&E | data, paper | >2.6M patches (from 430 scans) 430 fully annotated scans, 4675 scans with binary diagnosis, and 46 scans with diagnosis given independently by a group of 9 histopathologists | classi (256×256) | patch | 5x, 10x, 20x, 40x - Hamamatsu C12000-22 | 2021 | |
DigestPath2019 - signet ring cell [22] | multiple (Gastric, Intestine) | H&E | data, paper | Train: 460, Test: 226 | images + cell bounding boxes | cell detection | patch (avg 2kx2k) | 40x | 2019 |
DigestPath2019 - colonoscopy tissue segment [23] | Colon | H&E | data, paper | Train: 660, Test: 212 | images + lesion annotation | seg + classi (benign vs malignant) | patch (avg 5kx5k) | 20x | 2019 |
DLBCL-morphology [69] | Lymph Node | Multiple (H&E, IHC) | data, paper | 52.194 patches - 246 images from 209 patients | images + ROIs | wsi - patch (240x240) | 40x - Aperio AT2 | 2022 | |
ENDO-AID [] | Endometrial Carcinoma | H&E | data, info | Test: 91 wsi | images + 15 pathologists assessments | grading score | wsi | 0.5um/px - 3DHistech P1000 | 2022 |
Gelasca et al. [26] | Breast | H&E | data | 50 | images (malignant/benignant, 1.895 nuclei) + masks | classi + seg | Patch (896x768; 768x512) | ||
GlaS [24] | Colorectal (Gland) | H&E | data, paper | 165 | Train: 85 (37 benign, 48 malignant); Test: 80 (37 benign, 43 malignant) | classi + seg | Patch (diff sizes - few hundred px) | 20x - Zeiss MIRAX MIDI | 2015 |
Gleason_CNN [25] | Prostate | H&E | data, github, paper | 5 tissue microarrays (200-300 spots) | images + patch and pixel annotation | classi | patch (3100x3100) | 40x - NanoZoomer-XR Digital slide scanner, Hamamatsu | 2018 |
GTEx Portal [77] | Multiple | H&E | data, paper | 948 patients (multiple slides per patients) | images + genes + metadata | ||||
HER2 Contest [60] | Breast | Multiple (H&E, IHC) | data, paper | 172 wsi from 86 patients | image + label (scoring) | classi (4 classes: 0, 1+, 2+, 3+) | wsi | 4x-40x - Hamamatsu NanoZoomer C9600 | 2016 |
HEROHE - ECDP2020 [27a], [27b] | Breast | H&E | data, paper | Train: 359 (positive: 144, negatives: 215), Test: 150 (positive: 60, negative: 90) | images + binary label | classi | wsi | 20x - 3D Histech Pannoramic 1000 | 2020 |
HER2 tumor ROIs [70] | Breast | H&E | data, paper | 273 | images + ROIs + label | classi (binary) | patch (512x512) | 20x - Aperio ScanScope | 2022 |
HunCRC [71] | Colon | H&E | data, github, github, paper | 101,389 patches - 200 wsi (from 200 patients) | images + label | classi (10) | wsi - patch (512x512) | 40x - 3DHistech Pannoramic 1000 | 2022 |
IMP-CRS 2024 [81a],[81b],[81c] | Colorectal | H&E | data, paper | Train 4433 wsi, Test: 900 wsi | images + label | classi (3) | wsi | 40x - Leica GT450 | 2024 |
Janowczyk et al. [28] | Breast | H&E | data, github | 143 | images (12.000 nuclei) + masks | semantic seg | Patch (2000x2000) | 40x | 2015 |
Kather et al. [29] | Colon | H&E | data, github, paper | Train: 100k (86 wsi), Valid: 7180 (25 wsi) | image + label (9 tissue type) | classi | patch (224x224) | 2018 | |
Kather et al. [30] | Colon | H&E | data, data, data, github, paper | seg (tumor detection) + classi (MSI detection) | 2019 | ||||
KIMIA Path24C [65] | multiple | multiple (IHC, H&E, Masson's trichrome) | data, paper | Train: 22.591, Valid: 1.325 from 24 wsi | patch (1000x1000) | 20x - TissueScope LE 1.0. | 2021 | ||
Komura et al. [64] | multiple (32) | H&E | data, paper | 271.700 | images + cancer type | classi | patch (256x256) | 6 magnification (from TCGA) | 2021 |
Kumar [31] | multiple (8) | H&E | data, paper | Train: 16 (13.372 nuclei), test same organ (4.130 nuclei): 8, test diff organ (4.121 nuclei): 6 | images + nuclei seg + label | seg + classi | patch (1000x1000) | 40x (TCGA) | 2017 |
LC25000 [54] | multiple (lung, colon) | H&E | data, paper | 25.000 (5 classes) | images + label | patch (768x768) | classi | 60x | 2019 |
Lizard [32] | Colon | H&E | data, paper | 495.179 nuclei | images + instance seg mask | seg | patch | 20x (DigestPath + CRAG + GlaS + PanNuke + CoNSeP + TCGA) | 2021 |
LubLung [97] | Lung | H&E | data, paper | 23,199 patches (9 classes) | images + labels | classi | patch (87x87) | 2021 | |
LYON19 [33] | Multiple (Breast, Colon, Protate) | IHC | data, paper | Test: 441 ROIs - 171.166 cells | images + corrdinates of cell | cell detection | patch | Pannoramic 250Flash II scanner | 2019 |
MBM [94] | bone | H&E | data, paper | 44 patches | images+mask | cell detection | patch (600x600) | 40x | 2017 |
MHIST [79] | colorectal polyps | H&E | data, paper | 3,152 patches (train: 2,175; test: 977) | images + annotations + annotator agreement | classi (2) | patch (224x224) | 40x - Aperio AT2 | 2021 |
MIDOG 2021 [34] | Breast | H&E | data, paper | 200 wsi: 50 wsi / scanners - 4 scanners | images + roi | detection of mitotic figues | wsi | 2021 | |
MIDOG 2022 [35] | multiple (6 for train 10 for test) | H&E | data | Train: 405 cases, 9501 mitotic annotation | images + seg | seg | Patch | 2022 | |
MIDOG++ [93] | multiple | H&E | data, paper | 503 ROIs + 12k mitotic figures | images + object centers | detection of mitotic figures | ROIs | 2023 | |
MITOS_WSI_CCMCT [89] | Skin (Canine) | H&E | data, paper | 32 wsi | images + mitotic figures (45k)/ hard negatives (28k) | detection of mitotic figues | wsi | 40x Aperio ScanScope CS2 (Leica) | 2019 |
MITOS_WSI_CMC [90] | Breast (Canine) | H&E | data, paper | 21 wsi | images + mitotic figures (14k)/ hard negatives (35k) | detection of mitotic figues | wsi | 40x Aperio ScanScope CS2 (Leica) | 2020 |
MoNuSAC 2020 [36] | multiple (Lung, Prostate, Kidney, Breast) | H&E | data, paper | 31.411 nuclei from 209 images | images + mask | instance seg + classi | patch (81x113 to 1422x2162) | 40x (TCGA) | 2020 |
MoNuSeg [37a], [37b] | multiple (7) | H&E | data, github, paper | Train: 30, Test: 14 | images (Train: 22.000 nuclei, Test: 7000) + masks | instance seg | Patch (1000x1000) | 40x (from TCGA) | 2018 |
Multi-Scanner SCC [92] | Skin (Canine) | H&E | data, paper | 44 samples á 5 scanners (220 wsi) | images + contours (JSON) | registration + segmentation | wsi | 5 scanners | 2023 |
NADT-Prostate [72] | Prostate | Multiple (H&E, IHC) | data, paper | 1401 images from 37 patients | 20x | 2021 | |||
Naylor et al. [38] | Breast | H&E | data, paper | 50 | images (4.022 nuclei, 11 patients) + masks | seg | Patch (512x512) | 40x | 2018 |
NuClick [59] | Lymphocyte | IHC | data, paper | Train: 671, Valid: 200 | images + mask | seg | patch (256x256) | 2020 | |
NuCLS [39] | Breast | H&E | data, paper | 220.000 nuclei from 3.944 roi from 125 patients | roi + bounding bx + classification | nuclear detection + classi + seg | patch | (TCGA) | 2021 |
OCELOT [78] | Multiple (Bladder, Endometrium, Head-and-neck, Kidney, Prostate, Stomach) | H&E | data, paper, website | 304 Whole Slide Images (WSIs) (tr:val:te 6:2:2) | images + cell annotation + tissue annotation | cell and tissue detection (multitask learning) | patch (1024x1024) | (TCGA) | 2023 |
Osteosarcoma-Tumor-Assessment | Bone | H&E | data | 1144 images from 4 | classi (3: non-tumor, viable tumor, necrosis) | patch (1024x1024) | 10x | 2019 | |
Ovarian Bevacizumab Response [73a], [73b] | Ovary | H&E | data, paper, paper | 288 (78 patients) | images + clinical information | classi (treatment effectiveness) | wsi (avg 54342x41048) | 20x - Leica AT2 | 2021 |
PAIP2019 [40] | Liver | H&E | data, paper | Train: 50, Valid: 10, Test: 40 | images + binary mask | cancer seg | wsi | 20x - Aperio AT2 | 2019 |
PAIP2020 [41] | Colon | H&E | data, github | Train: 47, Valid: 31, Test: 40 | images + binary mask | cancer seg | wsi | 40x - Aperio AT2 | 2020 |
PAIP2021 [42] | Multiple (Colon, Prostate, Pancreas) | H&E | data, paper | Train: 150, Valid: 30, Test: 60 | wsi + xml gt | semantic seg | wsi | 20x - Aperio AT2 | 2021 |
PAIP2023 | multiple organ | H&E | data | 2023 | |||||
The PANDA challenge [43] | Prostate | H&E | data, paper | Train: 10.616, Valid: 393, Internal test: 545, External test: 1071 | images + label | classi | wsi | slide level analysis | 2020 |
Pan-tumor T-lymphocyte dataset [91] | Multiple | IHC (CD3) | data, paper | 92 ROIs | images + cell annotations | detection + classification | wsi | 40x NanoZoomer 2.0-HT (Hamamatsu) | 2023 |
SegPath [87] | multiple | H&E | data, paper | 158,687 patches | images + label + mask | semantic seg | patch | 20x - Zeiss MIRAX MIDI | 2023 |
PanNuke [44a], [44b] | multiple (19) | H&E | data, github, paper, paper | 189.744 nuclei (from >20k wsi) | images + nuclei (position + classi: neoplastic, connective, non-neoplastic epithelial, dead, inflammatory) | instance seg + classi | patch | 40x | 2019 |
PatchCamelyon [45a], [45b] | Lymph node | H&E | data, github paper | 327.680 | images + binary label | classi | Patch (96x96) | 10x | 2018 |
PATHVQA [80] | Multiple | Multiple | data, paper, github | 32,799 open-ended questions from 4,998 images | image + question + answer | VQA | patch/image | 2020 | |
Post-NAT-BRCA [74] | Breast | H&E | data, paper | 96 images from 54 patients | images + clinical info + annotation tumor cellularity and cell labels | wsi | 20x - Aperio | 2021 | |
Prostate Fused-MRI-Pathology [83] | Prostate | H&E | data | 114 images from 16 patients | images + tumor Annotations + mpMRI | wsi | 20x - Aperio | 2016 | |
RINGS [96] | prostate | H&E | data , paper | train: 1000 , test: 500 with 18'851 glands | images+mask | gland segmentation and tumor segmentation | patch (1500x1500) | 40x | 2021 |
SegLungTCGA [97] | Lung | H&E | data, paper | 454 images + file mapping info | images | segmentation (9 classes) | segmented (87x87 patches) wsi | (from TCGA) | 2021 |
SegPC-2021 [46a], [46b], [46c], [46d] | Blood | Jenner-Giemsa | data, github, report | 775 images, Train: 298, Valid: 200, Test: 277 | images + nucleus and cytoplasma | plasma cell segmentation | 2021 | ||
SICAPv2 [55] | Prostate | H&E | data, paper | 155 (from 95 patients) | images + global Gleason scores and patch-level Gleason grades | classi | wsi | 40x - Ventana iScan Coreo | 2020 |
SLN-Breast [75] | Breast | H&E | data, paper | 130 wsi from 78 patients | images + binary label | classi (binary - cancer/no cancer) | wsi | 20x - Leica Aperio AT2 | 2021 |
SPIE-AAPM_NCI BreastPathQ [47] | Breast | H&E | data, paper | 2579 patch from 96 wsi (64 patients) | images + score | regression | patches | 20x | 2019 |
TCGA [48] | Multiple | H&E | data, data | > 11k | WSI | ||||
TCGA-TIL-WSI [76] | Multiple (13) | H&E | data, github, paper | 5200 | (from TCGA) | 2019 | |||
TIGER [49] | Breast | H&E | data, paper, github, github | WSIROIS: 195 wsi, WSIBULK: 93, WSITILS: 82 | images + rois + label (7) | detection + segmentation + TILs scoring | wsi | (from TCGA, RUMC, JB) | 2022 |
TissueNet | Uterine cervix | H&E | data, github | 1,016 WSIs; 5,926 patches (1200x1200 px) | images + annotation + metadata + labels | classi (4) | wsi + patches | MIRAX, Aperio, Hamamatsu | 2020 |
TNBC [50] | Breast | H&E | data, data, paper | 50 images, 4022 cells (11 patients) | images + nuclei seg + label | seg + classi | patch (512x512) | 40x - Philips Ultra Fast Scanner (Curie Inst.) | 2019 |
Tolkach Y. et al. [84] | oesophageal adenocarcinomas | H&E | data, paper | UKK1: 34,704 patches from 22 wsi (20 patients); WNS: 121,642 patches from 62 wsi (15 patients); CHA: 32,796 patches from 214 wsi (69 patients); TCGA:178,187 patches from 22 wsi (22 patients) | images + label | classi (11) | patch(256x256) | 40x - Nanozoomer S360 | 2023 |
TUPAC16 [51] | Breast | H&E | data, paper | 500 | images + label | classi (wsi level) | WSI | 40x (from TCGA) | 2019 |
TUPAC16 - aux [52] | Breast - mitoses | H&E | data | 73 | images + locations | seg | patch | 40x (from TCGA) Leica SCN400 | 2019 |
UniToPatho [56] | Colon | H&E | data, paper | 9.536 from 292 wsi | images + label (6 classes) | classi | patch | 20x - Hamamatsu Nanozoomer S210 | 2021 |
UPENN-GBM [82] | glioblastoma | H&E | data,paper | 71 wsi from 34 patients | images + clinical data + mpMRI | WSI | 40x | 2022 | |
VisioMel | Melanoma | H&E | data, code | train: 1342 wsi, test: 600, valid: 1200, 16 WSIs annotated | images + annotation + clinical metadata + label | classi (2) | 2023 | ||
WSSS4LUAD [53] | Lung | H&E | data, paper | 87 (Train: 53, valid: 12, Test: 12) | Train: 10.091 patches, Valid: 40 patches, Test: 80 patches; image level for train, pixel level for test/valid | tissue semantic seg | wsi | (67 GDPH, 20 TCGA) | 2021 |
Follicle countring [98] | Ovary | H&E | data, paper, code | 643 cut slices (from 92 mice) | images + label | Object Detection (3 classes) | ROIs | Panoramic 250 Flash, Slide Scanner (3DHISTECH Ltd. HUNGARY) | 2024 |
[1a] Li, Zhang, et al. "Computer-aided diagnosis of lung carcinoma using deep learning-a pilot study." arXiv preprint arXiv:1803.05471 (2018).
[1b] Li, Zhang, et al. "Deep learning methods for lung cancer segmentation in whole-slide histopathology images—the acdc@ lunghp challenge 2019." IEEE Journal of Biomedical and Health Informatics 25.2 (2020): 429-440.
[2] Hosseini, Mahdi S., et al. "Atlas of digital pathology: A generalized hierarchical histological tissue type-annotated database for deep learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[3] Borovec, Jiří, et al. "ANHIR: automatic non-rigid histological image registration challenge." IEEE transactions on medical imaging 39.10 (2020): 3042-3052.
[4] Gamper, Jevgenij, and Nasir Rajpoot. "Multiple instance captioning: Learning representations from histopathology textbooks and articles." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
[5] Aresta, Guilherme, et al. "Bach: Grand challenge on breast cancer histology images." Medical image analysis 56 (2019): 122-139.
[6] Xu, Feng, et al. "Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides." Frontiers in oncology 11 (2021): 759007.
[7] Amgad, Mohamed, et al. "Structured crowdsourcing enables convolutional segmentation of histology images." Bioinformatics 35.18 (2019): 3461-3467.
[8] Spanhol, Fabio A., et al. "A dataset for breast cancer histopathological image classification." Ieee transactions on biomedical engineering 63.7 (2015): 1455-1462.
[9] Aksac, Alper, et al. "BreCaHAD: a dataset for breast cancer histopathological annotation and diagnosis." BMC research notes 12.1 (2019): 1-3.
[10] Bejnordi, Babak Ehteshami, et al. "Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer." Jama 318.22 (2017): 2199-2210.
[11] Bandi, Peter, et al. "From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge." IEEE transactions on medical imaging 38.2 (2018): 550-560.
[12] Litjens, Geert, et al. "1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset." GigaScience 7.6 (2018): giy065.
[13] Kwanyoung Lee, Hyungjo Byun, Hyunjung Shim Proceedings of The Cell Segmentation Challenge in Multi-modality High-Resolution Microscopy Images, PMLR 212:1-11, 2023.
[14] Graham, Simon, et al. "Conic: Colon nuclei identification and counting challenge 2022." arXiv preprint arXiv:2111.14485 (2021).
[15] Graham, Simon, et al. "Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images." Medical Image Analysis 58 (2019): 101563.
[16] Vu, Quoc Dang, et al. "Methods for segmentation and classification of digital microscopy tissue images." Frontiers in bioengineering and biotechnology (2019): 53.
[17] Vu, Quoc Dang, et al. "Methods for segmentation and classification of digital microscopy tissue images." Frontiers in bioengineering and biotechnology (2019): 53.
[18] Graham, Simon, et al. "MILD-Net: Minimal information loss dilated network for gland instance segmentation in colon histology images." Medical image analysis 52 (2019): 199-211.
[19] Sirinukunwattana, Korsuk, et al. "Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images." IEEE transactions on medical imaging 35.5 (2016): 1196-1206.
[20] Javed, Sajid, et al. "Cellular community detection for tissue phenotyping in colorectal cancer histology images." Medical image analysis 63 (2020): 101696.
[21] Mahbod, Amirreza, et al. "CryoNuSeg: A dataset for nuclei instance segmentation of cryosectioned H&E-stained histological images." Computers in biology and medicine 132 (2021): 104349.
[22] Li, Jiahui, et al. "Signet ring cell detection with a semi-supervised learning framework." International conference on information processing in medical imaging. Springer, Cham, 2019.
[23] Li, Jiahui, et al. "Signet ring cell detection with a semi-supervised learning framework." International conference on information processing in medical imaging. Springer, Cham, 2019.
[24] Sirinukunwattana, Korsuk, et al. "Gland segmentation in colon histology images: The glas challenge contest." Medical image analysis 35 (2017): 489-502.
[25] Arvaniti, Eirini, et al. "Automated Gleason grading of prostate cancer tissue microarrays via deep learning." Scientific reports 8.1 (2018): 1-11.
[26]
[27a] Conde-Sousa, Eduardo, et al. "HEROHE Challenge: assessing HER2 status in breast cancer without immunohistochemistry or in situ hybridization." arXiv preprint arXiv:2111.04738 (2021).
[27b] La Barbera, David, et al. "Detection of her2 from haematoxylin-eosin slides through a cascade of deep learning classifiers via multi-instance learning." Journal of Imaging 6.9 (2020): 82.
[28]
[29] Kather, Jakob Nikolas, Halama, Niels, & Marx, Alexander. (2018). 100,000 histological images of human colorectal cancer and healthy tissue (v0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.1214456
[30] Kather, Jakob Nikolas, et al. "Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer." Nature medicine 25.7 (2019): 1054-1056.
[31] Kumar, Neeraj, et al. "A dataset and a technique for generalized nuclear segmentation for computational pathology." IEEE transactions on medical imaging 36.7 (2017): 1550-1560.
[32] Graham, Simon, et al. "Lizard: a large-scale dataset for colonic nuclear instance segmentation and classification." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[33] Swiderska-Chadaj, Zaneta, et al. "Learning to detect lymphocytes in immunohistochemistry with deep learning." Medical image analysis 58 (2019): 101547.
[34] Aubreville, Marc, et al. "Mitosis domain generalization in histopathology images--The MIDOG challenge." arXiv preprint arXiv:2204.03742 (2022).
[35] Aubreville, Marc, et al. "Mitosis domain generalization challenge (2021)." 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). 2022.
[36] Verma, Ruchika, et al. "MoNuSAC2020: A multi-organ nuclei segmentation and classification challenge." IEEE Transactions on Medical Imaging 40.12 (2021): 3413-3423.
[37a] Kumar, Neeraj, et al. "A multi-organ nucleus segmentation challenge." IEEE transactions on medical imaging 39.5 (2019): 1380-1391.
[37b] Kumar, Neeraj, et al. "A dataset and a technique for generalized nuclear segmentation for computational pathology." IEEE transactions on medical imaging 36.7 (2017): 1550-1560.
[38] Naylor, Peter, et al. "Segmentation of nuclei in histopathology images by deep regression of the distance map." IEEE transactions on medical imaging 38.2 (2018): 448-459.
[39] Amgad, Mohamed, et al. "Nucls: A scalable crowdsourcing, deep learning approach and dataset for nucleus classification, localization and segmentation." arXiv preprint arXiv:2102.09099 (2021).
[40] Kim, Yoo Jung, et al. "PAIP 2019: Liver cancer segmentation challenge." Medical Image Analysis 67 (2021): 101854.
[41]
[42] Nateghi, Ramin, and Fattaneh Pourakpour. "Perineural Invasion Detection in Multiple Organ Cancer Based on Deep Convolutional Neural Network." arXiv preprint arXiv:2110.12283 (2021).
[43] Bulten, Wouter, et al. "Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge." Nature medicine 28.1 (2022): 154-163.
[44a] Gamper, Jevgenij, et al. "Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification." European congress on digital pathology. Springer, Cham, 2019.
[44b] Gamper, Jevgenij, et al. "Pannuke dataset extension, insights and baselines." arXiv preprint arXiv:2003.10778 (2020).
[45a] Veeling, Bastiaan S., et al. "Rotation equivariant CNNs for digital pathology." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2018.
[45b] Bejnordi, Babak Ehteshami, et al. "Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer." Jama 318.22 (2017): 2199-2210.
[46a] Gupta, Anubha, et al. "Segpc-2021: Segmentation of multiple myeloma plasma cells in microscopic images." IEEE Dataport 1.1 (2021): 1.
[46b] Gupta, Anubha, et al. "GCTI-SN: Geometry-inspired chemical and tissue invariant stain normalization of microscopic medical images." Medical Image Analysis 65 (2020): 101788.
[46c] Gehlot, Shiv, Anubha Gupta, and Ritu Gupta. "Ednfc-net: Convolutional neural network with nested feature concatenation for nuclei-instance segmentation." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
[46d] Gupta, Anubha, et al. "PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma." PloS one 13.12 (2018): e0207908.
[47] Petrick, Nicholas A., et al. "SPIE-AAPM-NCI BreastPathQ Challenge: an image analysis challenge for quantitative tumor cellularity assessment in breast cancer histology images following neoadjuvant treatment." Journal of Medical Imaging 8.3 (2021): 034501.
[48] R. L. Grossman, A. P. Heath, V. Ferretti, H. E. Varmus, D. R. Lowy, W. A. Kibbe, and L. M. Staudt. Toward a shared vision for cancer genomic data. New England Journal of Medicine, 375(12):1109–1112, 2016.
[49] Shephard, Adam, et al. "TIAger: Tumor-Infiltrating Lymphocyte Scoring in Breast Cancer for the TiGER Challenge." arXiv preprint arXiv:2206.11943 (2022).
[50] P. Naylor, M. Laé, F. Reyal and T. Walter, "Segmentation of Nuclei in Histopathology Images by Deep Regression of the Distance Map," in IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 448-459, Feb. 2019, doi: 10.1109/TMI.2018.2865709
[51] Veta, Mitko, et al. "Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge." Medical image analysis 54 (2019): 111-121.
[52] Veta, Mitko, et al. "Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge." Medical image analysis 54 (2019): 111-121.
[53] Han, Chu, et al. "WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma." arXiv preprint arXiv:2204.06455 (2022).
[54] Borkowski, Andrew A., et al. "Lung and colon cancer histopathological image dataset (lc25000)." arXiv preprint arXiv:1912.12142 (2019).
[55] Silva-Rodríguez, Julio, et al. "Going deeper through the Gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection." Computer Methods and Programs in Biomedicine 195 (2020): 105637.
[56] Barbano, Carlo Alberto, et al. "UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading." 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021.
[57] Zhu, Chuang, et al. "Hard Sample Aware Noise Robust Learning for Histopathology Image Classification." IEEE Transactions on Medical Imaging 41.4 (2021): 881-894.
[58] Koziarski, Michał, et al. "DiagSet: a dataset for prostate cancer histopathological image classification." arXiv preprint arXiv:2105.04014 (2021).
[59] Koohbanani, Navid Alemi, et al. "NuClick: a deep learning framework for interactive segmentation of microscopic images." Medical Image Analysis 65 (2020): 101771.
[60] Qaiser, Talha, et al. "Her 2 challenge contest: a detailed assessment of automated her 2 scoring algorithms in whole slide images of breast cancer tissues." Histopathology 72.2 (2018): 227-238.
[61] Sitnik, Dario, et al. "A dataset and a methodology for intraoperative computer-aided diagnosis of a metastatic colon cancer in a liver." Biomedical Signal Processing and Control 66 (2021): 102402.
[62] Brancati, Nadia, et al. "Bracs: A dataset for breast carcinoma subtyping in h&e histology images." arXiv preprint arXiv:2111.04740 (2021).
[63] Xu, Gang, et al. "Camel: A weakly supervised learning framework for histopathology image segmentation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
[64] Komura, Daisuke, et al. "Universal encoding of pan-cancer histology by deep texture representations." Cell Reports 38.9 (2022): 110424.
[65] Shafiei, Sobhan, et al. "Colored Kimia Path24 Dataset: Configurations and Benchmarks with Deep Embeddings." arXiv preprint arXiv:2102.07611 (2021).
[66] Weitz, Philippe, et al. "ACROBAT-Automatic Registration of Breast Cancer Tissue."
[67] Matek, Christian, et al. "Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks." Nature Machine Intelligence 1.11 (2019): 538-544.
[68] Matek, Christian, et al. "Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image data set." Blood, The Journal of the American Society of Hematology 138.20 (2021): 1917-1927.
[69] Vrabac, Damir, et al. "DLBCL-Morph: Morphological features computed using deep learning for an annotated digital DLBCL image set." Scientific Data 8.1 (2021): 1-8.
[70] Farahmand, Saman, et al. "Deep learning trained on hematoxylin and eosin tumor region of Interest predicts HER2 status and trastuzumab treatment response in HER2+ breast cancer." Modern Pathology 35.1 (2022): 44-51.
[71] Pataki, Bálint Ármin, et al. "HunCRC: annotated pathological slides to enhance deep learning applications in colorectal cancer screening." Scientific Data 9.1 (2022): 1-7.
[72] Wilkinson, Scott, et al. "Nascent prostate cancer heterogeneity drives evolution and resistance to intense hormonal therapy." European urology 80.6 (2021): 746-757.
[73a] Wang, Ching-Wei, et al. "Histopathological whole slide image dataset for classification of treatment effectiveness to ovarian cancer." Scientific Data 9.1 (2022): 1-5.
[73b] Wang, Ching-Wei, et al. "Weakly supervised deep learning for prediction of treatment effectiveness on ovarian cancer from histopathology images." Computerized Medical Imaging and Graphics 99 (2022): 102093.
[74] Peikari, Mohammad, et al. "Automatic cellularity assessment from post‐treated breast surgical specimens." Cytometry Part A 91.11 (2017): 1078-1087.
[75] Campanella, Gabriele, et al. "Clinical-grade computational pathology using weakly supervised deep learning on whole slide images." Nature medicine 25.8 (2019): 1301-1309.
[76] Saltz, Joel, et al. "Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images." Cell reports 23.1 (2018): 181-193.
[77] Lonsdale, John, et al. "The genotype-tissue expression (GTEx) project." Nature genetics 45.6 (2013): 580-585.
[78] Ryu, Jeongun, et al. "OCELOT: Overlapped Cell on Tissue Dataset for Histopathology." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
[79] Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour, “A Petri Dish for Histopathology Image Analysis”, International Conference on Artificial Intelligence in Medicine (AIME), 12721:11-24, 2021.
[80] Do, Tuong, et al. "Multiple meta-model quantifying for medical visual question answering." Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. Springer International Publishing, 2021.
[81a] Oliveira, S.P., Neto, P.C., Fraga, J., Montezuma, D., Monteiro, A., Monteiro, J., Ribeiro, L., Gonçalves, S., Pinto, I.M. and Cardoso, J.S., 2021. CAD systems for colorectal cancer from WSI are still not ready for clinical acceptance. Scientific Reports, 11(1), pp.1-15. https://doi.org/10.1038/s41598-021-93746-z
[81b] Neto, P.C., Oliveira, S.P., Montezuma, D., Fraga, J., Monteiro, A., Ribeiro, L., Gonçalves, S., Pinto, I.M. and Cardoso, J.S., 2022. iMIL4PATH: A semi-supervised interpretable approach for colorectal whole-slide images. Cancers, 14(10). https://doi.org/10.3390/cancers14102489
[81c] Neto, P.C., Montezuma, D., Oliveira, S.P., Oliveira, D., Fraga, J., Monteiro, A., Monteiro, J., Ribeiro, L., Gonçalves, S., Reinhard, S., Zlobec ,I. , Pinto, I.M. and Cardoso, J.S., 2024. An interpretable machine learning system for colorectal cancer diagnosis from pathology slides. npj Precision Oncology. https://doi.org/10.1038/s41698-024-00539-4
[82] Bakas, Spyridon, et al. "The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics." Scientific data 9.1 (2022): 453.
[83] Madabhushi, A., & Feldman, M. (2016). Fused Radiology-Pathology Prostate Dataset (Prostate Fused-MRI-Pathology) . The Cancer Imaging Archive. doi; 10.7937/k9/TCIA.2016.tlpmr1am
[84] Tolkach, Yuri, et al. "Artificial intelligence for tumour tissue detection and histological regression grading in oesophageal adenocarcinomas: a retrospective algorithm development and validation study." The Lancet Digital Health 5.5 (2023): e265-e275.
[85] Mengdan Zhu, Bing Ren, Ryland Richards, Matthew Suriawinata, Naofumi Tomita, Saeed Hassanpour, "Development and Evaluation of a Deep Neural Network for Histologic Classification of Renal Cell Carcinoma on Biopsy and Surgical Resection Slides", Scientific Reports;11:7080 (2021).
[86] Jason Wei, Laura Tafe, Yevgeniy Linnik, Louis Vaickus, Naofumi Tomita, Saeed Hassanpour, "Pathologist-level Classification of Histologic Patterns on Resected Lung Adenocarcinoma Slides with Deep Neural Networks", Scientific Reports;9:3358 (2019).
[87] Daisuke Komura, Takumi Onoyama, Koki Shinbo, Hiroto Odaka, Minako Hayakawa, Mieko Ochi, Ranny Rahaningrum Herdiantoputri, Haruya Endo, Hiroto Katoh, Tohru Ikeda, Tetsuo Ushiku, Shumpei Ishikawa, Restaining-based annotation for cancer histology segmentation to overcome annotation-related limitations among pathologists, Patterns, Volume 4, Issue 2, 2023, 100688, https://doi.org/10.1016/j.patter.2023.100688.
[88] Wilm, Frauke, Fragoso, Marco, Marzahl, Christian, Qiu, Jingna, Puget, Chloé, Diehl, Laura, Bertram, Christof A., Klopfleisch, Robert, Maier, Andreas, Breininger, Katharina and Aubreville, Marc. "Pan-tumor CAnine cuTaneous Cancer Histology (CATCH) dataset". Sci Data 9, 588 (2022). https://doi.org/10.1038/s41597-022-01692-w
[89] Bertram, Christof A., Aubreville, Marc, Marzahl, Christian, Maier, Andreas and Klopfleisch, Robert. "A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor". Sci Data 6, 274 (2019). https://doi.org/10.1038/s41597-019-0290-4
[90] Aubreville, Marc, Bertram, Christof A., Donovan, Taryn A., Marzahl, Christian, Maier, Andreas and Klopfleisch, Robert. "A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research". Sci Data 7, 417 (2020). https://doi.org/10.1038/s41597-020-00756-z
[91] Wilm, Frauke et al. "Pan-tumor T-lymphocyte detection using deep neural networks: Recommendations for transfer learning in immunohistochemistry". Journal of Pathology Informatics. Vol. 14, 100301 (2023).
[92] Wilm, Frauke, Fragoso, Marco, Bertram, Christof A., Stathonikos, Nikolas, Öttl, Mathias, Qiu, Jingna, Klopfleisch, Robert, Maier, Andreas, Breininger, Katharina and Aubreville, Marc. Proceedings of the German Workshop on Medical Image Processing (BVM), pp 206–211 (2023).
[93] Aubreville Marc, Wilm, Frauke et al. "A comprehensive multi-domain dataset for mitotic figure detection". Sci Data 10, 484 (2023). https://doi.org/10.1038/s41597-023-02327-4
[94] Paul Cohen, Joseph, et al. "Count-ception: Counting by fully convolutional redundant counting." Proceedings of the IEEE International conference on computer vision workshops. 2017. https://doi.org/10.1109/ICCVW.2017.9
[95] Huang, Junjia, et al. "Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
[96] Salvi, Massimo, et al. "A hybrid deep learning approach for gland segmentation in prostate histopathological images." Artificial Intelligence in Medicine 115 (2021): 102076.
[97] Rączkowska, A., Paśnik, I., Kukiełka, M. et al. "Deep learning-based tumor microenvironment segmentation is predictive of tumor mutations and patient survival in non-small-cell lung cancer." BMC Cancer 22, 1001 (2022).
[98] Blot, V. et al. "Efficient Precision Control in Object Detection Models for Enhanced and Reliable Ovarian Follicle Counting". In: Sudre, C.H., Mehta, R., Ouyang, C., Qin, C., Rakic, M., Wells, W.M. (eds) Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. UNSURE 2024. Lecture Notes in Computer Science, vol 15167. Springer, Cham. https://doi.org/10.1007/978-3-031-73158-7_17
- Google Dataset Research
- Kaggle
- Zenodo
- Grand Challenge
- Warwick
- Cancer Imaging Archive
- GDC Data Portal
- Kimia Lab
- Data Mendeley
- Harvard Dataverse
- Driven Data
- Deep Microscopy
Marie (Duc) Stettler