{% hint style="info" %} Data hosted by IDC is ingested from several sources, including The Cancer Imaging Archive (TCIA), Genomics Data Commons (GDC), Clinical Proteomic Tumor Analysis Consortium (CPTAC) and Human Tumor Atlas Network (HTAN).
Please refer to the license and terms of use, which are defined in the license_url
and source_doi
or source_doi
of the IDC BigQuery dicom_all
table. You can filter the data by license type in the IDC Portal.
{% endhint %}
New pathology collections
New analysis results
- Pancreas-CT-SEG
Collections analyzed: - Pan-Cancer-Nuclei-Seg-DICOM
Collections analyzed:
Revised radiology collections
- Advanced-MRI-Breast-Lesions
- CMB-AML
- CMB-CRC
- CMB-GEC
- CMB-LCA
- CMB-MEL
- CMB-MML
- CMB-PCA
- CPTAC-CCRCC
- CPTAC-LSCC
- CPTAC-UCEC
- NLM-Visible-Human-Project
- RIDER Lung CT
Cancer Moonshot Biobank (CMB) radiology images were updated to fix incorrect values assigned to PatientID
(see details on the collection pages linked above). The updated images have different DICOM Study/Series/SOPInstanceUIDs.
Revised analysis results
- BAMF-AIMI-Annotations
Collections analyzed:- ACRIN-NSCLC-FDG-PET
- Anti-PD-1_Lung
- Colorectal-Liver-Metastases
- CPTAC-CCRCC
- Duke-Breast-Cancer-MRI
- HCC-TACE-Seg
- Lung-PET-CT-Dx
- NLST
- NSCLC Radiogenomics
- Prostate-MRI-US-Biopsy
- PROSTATEx
- QIN-BREAST
- QIN LUNG CT
- RIDER Lung PET-CT
- SPIE-AAPM Lung CT Challenge
- TCGA-KICH
- TCGA-KIRC
- TCGA-KIRP
- TCGA-LIHC
- TCGA-LUAD
- TCGA-LUSC
- UPENN-GBM
New clinical metadata tables
- acrin_contralateral_breast_mr_A0
- acrin_contralateral_breast_mr_AB
- acrin_contralateral_breast_mr_F1
- acrin_contralateral_breast_mr_I1
- acrin_contralateral_breast_mr_IA
- acrin_contralateral_breast_mr_IM
- acrin_contralateral_breast_mr_IS
- acrin_contralateral_breast_mr_KS
- acrin_contralateral_breast_mr_MS
- acrin_contralateral_breast_mr_M4
- acrin_contralateral_breast_mr_P8
- acrin_contralateral_breast_mr_PA
- acrin_contralateral_breast_mr_PD
- acrin_contralateral_breast_mr_PE
- acrin_contralateral_breast_mr_PR
- acrin_contralateral_breast_mr_QA
- advanced_mri_breast_lesions_clinical
- upenn_gbm
New radiology collections
New analysis results
- RMS-Mutation-Prediction-Expert-Annotations*
Collections analyzed: - TotalSegmentator-CT-Segmentations**
Collections analyzed:
Revised radiology collections
(starred collections are revised due to new or revised analysis results)
- Breast-Cancer-Screening-DBT (revisions only to clinical data)
- NLST**
Revised pathology collections
(starred collections are revised due to new or revised analysis results)
- CPTAC-BRCA (fix PatientAges > 090Y)
- CPTAC-COAD (fix PatientAges > 090Y)
- RMS-Mutation-Prediction*
- Also added missing instance
SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.3459553143.523311062.1687086765943.9.0 - Removed corrupted instances
- SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2164023716.1899467316.1685791236516.37.0
- SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.1686038949651.37.0
- SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.16860389
- Also added missing instance
- TCGA-BLCA (All TCGA revisions are to correct multiple manufacturer values within same series)
- TCGA-BRCA
- TCGA-CHOL
- TCGA-COAD
- TCGA-DLBC (No description page)
- TCGA-ESCA
- TCGA-HNSC
- TCGA-KIRC
- TCGA-KIRP
- TCGA-LIHC
- TCGA-LUAD
- TCGA-LUSC
- TCGA-PAAD
- TCGA-PRAD
- TCGA-READ
- TCGA-SARC
- TCGA-SKCM
- TCGA-STAD
- TCGA-TGCT
- TCGA-THCA
- TCGA-THYM
- TCGA-UCEC
- TCGA-UCS
New clinical metadata tables
- acrin_nsclc_fdg_pet_bamf_lung_pet_ct_segmentation
- anti_pd_1_lung_bamf_lung_ct_segmentation
- anti_pd_1_lung_bamf_lung_fdg_pet_ct_segmentation
- lung_pet_ct_dx_bamf_lung_ct_segmentation
- lung_pet_ct_dx_bamf_lung_fdg_pet_ct_segmentation
- nsclc_radiogenomics_bamf_lung_ct_segmentation
- nsclc_radiogenomics_bamf_lung_fdg_pet_ct_segmentation
- prostatex_bamf_segmentations
- qin_breast_bamf_breast_segmentation
- rider_lung_pet_ct_bamf_lung_ct_segmentation
- rider_lung_pet_ct_bamf_lung_fdg_pet_ct_segmentation
- tcga_kirc_bamf_kidney_segmentation
- tcga_lihc_bamf_liver_ct_segmentation
- tcga_lihc_bamf_liver_mr_segmentation
- tcga_luad_bamf_lung_ct_segmentation
- tcga_luad_bamf_lung_mr_segmentation
- tcga_lusc_bamf_lung_ct_segmentation
- tcga_lusc_bamf_lung_mr_segmentation
Notes
The deprecated columns tcia_api_collection_id
and idc_webapp_collection_id
have been removed from the auxiliary_metadata
table in the idc_v18
BQ dataset. These columns were duplicates of columns collection_name
and collection_id
respectively.
New radiology collections
New analysis results
-
Collections analyzed:
-
Prostate-MRI-US-Biopsy-DICOM-Annotations
Collections analyzed:
Revised radiology collections
New clinical metadata tables
- ea1141_demographics
- ea1141_mri
- ea1141_risk_model
- ea1141_screening
- ea1141_status_12mo
- ea1141_status_6mo
- ea1141_tomosynthesis
- htan_ohsu_demographics
- htan_vanderbilt_demographics
- htan_vanderbilt_diagnosis
- htan_vanderbilt_exposure
- htan_vanderbilt_familyhistory
- htan_vanderbilt_followup
- htan_vanderbilt_moleculartest
- htan_vanderbilt_therapy
- remind_clinical
New radiology collections
New pathology collections
Revised radiology collections
- Breast-MRI-NACT-Pilot (TCIA description: (Repair of DICOM tag(0008,0005) to value "ISO_IR 100" in 79 series)
- CPTAC-CRCC (Revised because results from CPTAC-CRCC-Tumor-Annotations were added)
- CPTAC-UCEC (Revised because results from CPTAC-UCEC-Tumor-Annotations were added)
- CPTAC-PDA (Revised because results from CPTAC-PDA-Tumor-Annotations were added)
New analysis results
New clinical metadata tables
- htan_hms_demographics
- htan_hms_diagnosis
- htan_hms_exposure
- htan_hms_familyhistory
- htan_hms_followup
- htan_hms_moleculartheraphy
- htan_ohsu_demographics
- htan_ohsu_diagnosis
- htan_ohsu_exposure
- htan_ohsu_familyhistory
- htan_ohsu_followup
- htan_ohsu_moleculartheraphy
- htan_wustl_demographics
- htan_wustl_diagnosis
- htan_wustl_exposure
- htan_wustl_familyhistory
- htan_wustl_followup
- htan_wustl_moleculartheraphy
- rms_mutation_prediction_demographics
- rms_mutation_prediction_diagnosis
- rms_mutation_prediction_sample
New radiology collections
- Adrenal-ACC-Ki67-Seg
- CC-Tumor-Heterogeneity
- Colorectal-Liver-Metastases
- NLM-Visible-Human-Project
- Prostate-Anatomical-Edge-Cases
- RIDER Pilot
New pathology collections
- HTAN-VANDERBILT
- ICDC-Glioma (ICDC-Glioma radiology added in a previous version)
Revised radiology collections
- CPTAC-CCRCC (TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)
- CPTAC-CM (“TCIA description: Radiology modality data cleanup to remove extraneous scans.”)
- CPTAC-LSCC (TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)
- CPTAC-LUAD (TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)
- CPTAC-PDA (TCIA description: TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)
- CPTAC-SAR (TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)
- CPTAC-UCEC (TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)
- CT Lymph Nodes (TCIA description: “Added DICOM version of MED_ABD_LYMPH_MASKS.zip segmentations that were previously available”)
- RIDER Lung CT (Revised because QIBA-VolCT-1B analysis results were added)
- NLST (Revised because analysis results from nnU-Net-BPR-Annotations were revised)
- NSCLC-Radiomics (Revised because analysis results from nnU-Net-BPR-Annotations were revised)
Revised pathology collections
- CPTAC-GBM (11 pathology-only patients removed at request of data owner)
- CPTAC-SAR (1 pathology-only patient removed at request of data owner)
New analysis results
- QIBA-VolCT-1B (Analysis of NLST and NSCLC-Radiomics)
Revised analysis results
- nnU-Net-BPR-Annotations (Annotations of NLST and NSCLC-Radiomics radiology)
New clinical metadata tables
- adrenal_acc_ki67_seg_clinical
- cc_tumor_heterogeneity_clinical
- colorectal_liver_metastases_clinical
- duke_breast_cancer_mri_clinical
- nlst_clinical
- nlst_ctab
- nlst_ctabc
- nlst_prsn
- nlst_screen
This release does not introduce any new data, but changes the bucket organization and introduces replication of IDC files in Amazon AWS storage buckets, as described in this section.
New analysis results collection:
New clinical data collections:
New collections:
Updated collections:
Other:
Metadata corresponding to "limited" access collections are removed.
New clinical data collections:
Other clinical data updates:
Limited access collections are removed. Clinical metadata for the COVID-19-NY-SUB and ACRIN 6698/I-SPY2 Breast DWI collections now includes information ingested from data dictionaries associated with these collections. In v11 the string value 'NA' was being changed to null during the ETL process for some columns/collections. This is now fixed in v12 and the value 'NA' is preserved.
This release introduces clinical data ingested for a subset of collections, and now available via a dedicated BigQuery dataset.
New collections:
In this release we introduce a new HTAN program including currently three collections release by the Human Tumor Atlas Network.
New collections:
Updated collections:
CPTAC, TCGA and NLST collections have been reconverted due to a technical issue identified with a subset of images included in v9.
- CPTAC-AML
- CPTAC-BRCA *
- CPTAC-CCRCC
- CPTAC-CM
- CPTAC-COAD
- CPTAC-GBM
- CPTAC-HNSCC
- CPTAC-LSCC
- CPTAC-LUAD
- CPTAC-OV
- CPTAC-PDA
- CPTAC-SAR
- CPTAC-UCEC
- Duke-Breast-Cancer-MRI
- NLST
- TCGA-ACC
- TCGA-BLCA
- TCGA-BRCA
- TCGA-BRCA
- TCGA-CESC
- TCGA-CHOL
- TCGA-COAD
- TCGA-DLBC
- TCGA-ESCA
- TCGA-GBM
- TCGA-GBM
- TCGA-HNSC
- TCGA-KICH
- TCGA-KIRC
- TCGA-KIRP *
- TCGA-LGG
- TCGA-LGG
- TCGA-LIHC
- TCGA-LUAD
- TCGA-LUSC
- TCGA-MESO
- TCGA-OV
- TCGA-PAAD
- TCGA-PCPG
- TCGA-PRAD
- TCGA-READ
- TCGA-SARC
- TCGA-SKCM
- TCGA-STAD
- TCGA-TGCT
- TCGA-THCA
- TCGA-THYM
- TCGA-UCEC
- TCGA-UCS
- TCGA-UVM
Note that the TCGA-KIRP and TCGA-BRCA collections (marked with the asterisk in the list above) are currently missing SM high resolution layer files/instances due to a known limitation of Google Healthcare that makes it not possible to ingest datasets that exceed some internal limits. Specifically, the following patient/studies are affected:
- TCGA-KIRP:
PatientID
TCGA-5P-A9KA,StudyInstanceUID
2.25.191236165605958868867890945341011875563 - TCGA-BRCA:
PatientID
TCGA-OL-A66H,StudyInstanceUID
2.25.82800314486527687800038836287574075736 The affected files will be included in IDC when the infrastructure limitation is addressed.
Collection access level change:
- Vestibular-Schwannoma-SEG is now available as public access collection
This data release introduces the concept of differential license to IDC: some of the collections maintained by IDC contain items that have different licenses. As an example, radiology component of the TCGA-GBM collection is covered by the TCIA limited access license, and is not available in IDC, while the digital pathology component is covered by CC-BY. With this release, we complete sharing in full of the digital pathology component of the datasets released by the CPTAC and TCGA programs.
New collections:
Updated collections:
The main highlight of this release is the addition of the NLST and TCGA Slide Microscopy imaging data. New TCGA content includes introduction of new (to IDC) TCGA collections that have only slide microscopy component, and addition of the slide microscopy component to those IDC collections that were available earlier and included only the radiology component.
New collections
- TCGA-ACC
- TCGA-CHOL
- TCGA-DLBC (TCGA-DLBC collection does not have a description page)
- TCGA-MESO
- TCGA-PAAD
- TCGA-PCPG
- TCGA-SKCM
- TCGA-TGCT
- TCGA-THYM
- TCGA-UCS
- TCGA-UVM
Updated collections
- NLST
- TCGA-BLCA
- TCGA-BRCA
- TCGA-BRCA
- TCGA-CESC
- TCGA-COAD
- TCGA-ESCA
- TCGA-KICH
- TCGA-KIRC
- TCGA-KIRP
- TCGA-LIHC
- TCGA-LUAD
- TCGA-LUSC
- TCGA-OV
- TCGA-PRAD
- TCGA-READ
- TCGA-SARC
- TCGA-STAD
- TCGA-THCA
- TCGA-UCEC
The main highlight of this release is the addition of the Slide Microscopy imaging component to the remaining CPTAC collections.
New collections
- APOLLO-5-ESCA
- APOLLO-5-LUAD
- APOLLO-5-PAAD
- APOLLO-5-THYM
- CPTAC-AML
- CPTAC-BRCA
- CPTAC-COAD
- CPTAC-OV
- Pancreatic-CT-CBCT-SEG
- Pediatric-CT-SEG
Updated collections
The following collections became limited access due to the change in policy by TCIA, which is the original source of those collections.
Original collections:
- AAPM-RT-MAC
- ACRIN-DSC-MR-Brain
- ACRIN-FMISO-Brain
- ACRIN-HNSCC-FDG-PET-CT
- Anti-PD-1_MELANOMA
- Brain-Tumor-Progression
- CPTAC-GBM
- CPTAC-HNSCC
- HEAD-NECK-RADIOMICS-HN1
- HNSCC
- HNSCC-3DCT-RT
- Head-Neck Cetuximab
- Head-Neck-PET-CT
- IvyGAP
- LGG-1p19qDeletion
- MRI-DIR
- OPC-Radiomics
- QIN GBM Treatment Response
- QIN-BRAIN-DSC-MRI
- QIN-HEADNECK
- REMBRANDT
- RIDER NEURO MRI
- TCGA-GBM
- TCGA-HNSC
- TCGA-LGG
- Vestibular-Schwannoma-SEG
Analysis results collections:
- DICOM-SEG Conversions for TCGA-LGG and TCGA-GBM Segmentation Datasets
- Outcome Prediction in Patients with Glioblastoma by Using Imaging, Clinical, and Genomic Biomarkers: Focus on the Nonenhancing Component of the Tumor
New collections:
- COVID-19-NY-SBU
- B-mode-and-CEUS-Liver
- APOLLO-5-LSCC
- CMMD
- ACRIN-HNSCC-FDG-PET-CT
- Duke-Breast-Cancer-MRI
New analysis results collections:
- Outcome Prediction in Patients with Glioblastoma by Using Imaging, Clinical, and Genomic Biomarkers: Focus on the Nonenhancing Component of the Tumor (GBM-MR-NER-Outcomes)
- DICOM-SEG Conversions for TCGA-LGG and TCGA-GBM Segmentation Datasets (DICOM-Glioma-SEG)
Updated collections:
- TCGA-GBM
- TCGA-LGG
- QIN-HEADNECK
- Breast-Cancer-Screening-DBT
- NSCLC Radiogenomics
- QIN-HEADNECK
- Pseudo-PHI-DICOM-Data
National Lung Screening Trial (NLST) collection is added. The data included consists of the following components:
1) CT images available as any other imaging collection (via IDC Portal, BigQuery metadata tables, and storage buckets);
2) a subset of clinical data available in the BigQuery tables starting with nlst_
under the idc_v4
dataset, as documented in the Collection-specific BigQuery Tables section.
3) One instance is missing from patient/study/series:
126153/1.2.840.113654.2.55.319335498043274792486636919135185299851/1.2.840.113654.2.55.262421043240525317038356381369289737801
4) Three instances are missing from patient/study/series:
215303/1.3.6.1.4.1.14519.5.2.1.7009.9004.337968382369511017896638591276/1.3.6.1.4.1.14519.5.2.1.7009.9004.180224303090109944523368212991
The following radiology collections were updated to include DICOM Slide Microscopy (SM) images converted from the original vendor-specific representation into dual personality DICOM-TIFF format.
{% hint style="warning" %} The DICOM Slide Microscopy (SM) images included in the collections above in IDC are not available in TCIA. TCIA only includes images in the vendor-specific SVS format! {% endhint %}
Listed below are all of the original and analysis results collections of The Cancer Imaging Archive currently hosted by IDC, with the links to the Digital Object Identifiers (DOIs) of those collections.
New original collections:
- IvyGAP
- QIN LUNG CT
- LungCT-Diagnosis
- HEAD-NECK-RADIOMICS-HN1
- Prostate Fused-MRI-Pathology
- APOLLO
- LGG-1p19qDeletion
- Soft-tissue-Sarcoma
- NSCLC-Radiomics-Genomics
- Brain-Tumor-Progression
- Head-Neck Cetuximab
- CPTAC-GBM
- CPTAC-SAR
- CPTAC-LUAD
- CPTAC-LSCC
- Head-Neck-PET-CT
- C4KC-KiTS
- Breast-MRI-NACT-Pilot
- 4D-Lung
- Mouse-Mammary
- CT Lymph Nodes
- HNSCC
- Breast-Cancer-Screening-DBT
- MRI-DIR
- Lung-PET-CT-Dx
- NSCLC-RADIOMICS-INTEROBSERVER1
- PDMR-BL0293-F563
- CT COLONOGRAPHY
- Phantom FDA
- QIN-PROSTATE-Repeatability
- PROSTATEx
- AAPM-RT-MAC
- ICDC-Glioma
- RIDER Breast MRI
- Anti-PD-1_MELANOMA
- COVID-19-AR
- PROSTATE-MRI
- NaF PROSTATE
- Mouse-Astrocytoma
- ACRIN-DSC-MR-Brain
- ACRIN-NSCLC-FDG-PET
- QIN Breast DCE-MRI
- RIDER NEURO MRI
- MIDRC-RICORD-1A
- MIDRC-RICORD-1C
- REMBRANDT
- NSCLC Radiogenomics
- HNSCC-3DCT-RT
- VICTRE
- CPTAC-CM
- CPTAC-PDA
- CPTAC-UCEC
- CPTAC-CCRCC
- CPTAC-HNSCC
- OPC-Radiomics
- Vestibular-Schwannoma-SEG
- SPIE-AAPM Lung CT Challenge
- Lung Phantom
- Pseudo-PHI-DICOM-Data
- Pancreas-CT
- QIN GBM Treatment Response
- Pelvic-Reference-Data
- Lung-Fused-CT-Pathology
- Anti-PD-1_Lung
- BREAST-DIAGNOSIS
- RIDER Lung PET-CT
- RIDER Lung CT
- PDMR-292921-168-R
- PDMR-833975-119-R
- PDMR-997537-175-T
- LCTSC
- Prostate-3T
- ACRIN-FLT-Breast
- ACRIN-FMISO-Brain
- PDMR-425362-245-T
- Prostate-MRI-US-Biopsy
- MIDRC-RICORD-1B
- DRO-Toolkit
New analysis results collections:
- PROSTATEx Zone Segmentations
- High Resolution Prostate Segmentations for the ProstateX-Challenge
- RIDER Lung CT Segmentation Labels from: Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach
Listed below are all of the original and analysis results collections of The Cancer Imaging Archive currently hosted by IDC, with the links to the Digital Object Identifiers (DOIs) of those collections.
Original collections included:
- TCGA-PRAD
- TCGA-BLCA
- TCGA-UCEC
- TCGA-HNSC
- TCGA-LUSC
- TCGA-KIRP
- TCGA-THCA
- TCGA-SARC
- TCGA-ESCA
- TCGA-CESC
- TCGA-STAD
- TCGA-COAD
- TCGA-KICH
- TCGA-READ
- TCGA-LUAD
- TCGA-LIHC
- TCGA-BRCA
- TCGA-OV
- TCGA-KIRC
- TCGA-LGG
- TCGA-GBM
- ISPY1 (ACRIN 6657)
- QIN-HeadNeck
- LIDC-IDRI
- NSCLC-Radiomics
Analysis collections included:
- Standardized representation of the TCIA LIDC-IDRI annotations using DICOM
- QIN multi-site collection of Lung CT data with Nodule Segmentations (only items corresponding to the LIDC-IDRI original collection are included)
- DICOM SR of clinical data and measurement for breast cancer collections to TCIA (only items corresponding to the ISPY1 original collection are included)