Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOP Document] Imaging Data #5

Open
jshoughtaling opened this issue Apr 5, 2024 · 2 comments
Open

[SOP Document] Imaging Data #5

jshoughtaling opened this issue Apr 5, 2024 · 2 comments
Assignees

Comments

@jshoughtaling
Copy link
Collaborator

No description provided.

@CmCC1435
Copy link

Items to be considered to include:

  • Standardization, if needed
  • De-identification
  • Tools that either should or recommended to be used for de-identification

@haritrivedi
Copy link
Collaborator

Goal: Deidentified DICOM images that are linkable to waveform and EHR data.

Final Deliverable:

  • Deidentified DICOM image file (header and pixels)
  • De-identified DICOM Metadata (minimum: link [i.e., unique patient ID], date of test [shifted], data type [e.g., CT, MRI])
  • Recommended list of tags to place into OMOP tables (see Appendix A)

Phase-1
1.1 SOP - Data Acquisition
1.2 Header deidentification - Tools
1.3 Pixel deidentification - Tools
1.4 OMOP linkage - Standards
1.5 Defacing - Tools
1.6 Local manual review tool - Tools
1.7 Data sharing and intake to Azure

Phase-2
1.5 Image Processing-Analyses Pipeline (Central)
1.6 Cross-data linkage Validation (Central)
1.7 Open Data Sharing (Central => Open Data)

Tools
Images DEID
1.1 Metadata DICOM
We will use the RSNA Clinical Trials Processor (CTP) to de-identify DICOM metadata. CTP allows removal of tags known to contain PHI, replacement of patient identifiers (Patient ID and Accession Numbers) from a lookup table, date jittering, and replacements of UIDs (we can request a CHORUS UID here)).
We will initially plan to remove private tags as their contents are unknown. CTP also allows removal of DICOM files that meet certain criteria and have likelihood of containing pixel PHI, including the following:

  1. Secondary Captures
  2. Dose Reports
  3. Other Reports
  4. Scanned documents
    At the conclusion of this, we expect that > 99.99% of images with pixel PHI would be removed.
    CTP Is non-desctructive for DICOM files, therefore files that are originally DICOM compliant will remain so. By default, CTP will also not alter the image pixel array, therefore will not alter DICOM image appearance or compression.
    1.2 Metadata Extraction for Inspection and Easy Access
    We will provide a script that traverses DICOM files and extracts specified headers into a CSV file. This will allow easy inspection and searching of DICOM tags for identification of lingering PHI. If any is found, offending files can be manually removed or CTP rules can be updated and images can be reprocessed. This table can later function as a mechanism to translate specific tags into OMOP.
    1.3 DICOM Metadata Harmonization
    In our initial phase, we will not harmonize DICOM metadata such as StudyDescription and SeriesDescription due to challenges in data heterogeneity. In subsequent phases, we can consider using existing ontologies or third-party vendors for this.
    1.4 Pixel De-Identification
    Pixel de-identification remains a challenging problem as it is processor intensive and no existing solutions are 100% accurate. Manual review of all images at each site is also time and cost-prohibitive.
    Therefore, we propose that each site de-identifies its DICOM files using the tools in 1.1 and 1.2, followed by sending the images to the CHORUS lead site where pixel data can be checked and verified.
    There are several tools/methods under consideration:
  5. MD.ai – online viewer and annotation platform with a deep learning text-detection model to locate any residual text on images.
  6. Enlitic – Encog software for bulk pixel de-id and examination, also possibly data harmonization
  7. IBIS – bulk pixel de-id and examination
  8. University of Nebraska – turnkey service for examination and redaction of PHI in pixel data, with guaranteed performance and assumption of liablity
    1.5 Defacing
    Defacing of head and face CT and MRI remains unsolved. All proposed solutions are destructive in that they permanently alter the image which can have unforeseen effects on AI model training. Furthermore, there is no guarantee that these methods cannot be reverse engineered at a later date.
    Potential solutions:
  9. Do not include any face or head CT or MRI in initial exam extractions from sites
  10. Include head and face CT/MRIs, but remove any series with slice thickness <5mm. Slice thickness > 5mm would not have sufficient fidelity to reconstruct faces
  11. Require all end users to sign an agreement to not attempt to reconstruct/re-identify faces. This is the method in use by most commercial vendors who sell or license data.
    1.6 Human Quality Review
    a) Whichever tool is selected for pixel de-id would also include a viewer (except if outsourced to Univ. Nebraska). Otherwise, several cloud-based and on-prem viewers are available.

Standards
1.4 Linkage ID
a) Accession number: image_occurrence_id
b) Procedure number: procedure_occurrence_id
c) Unified Time-shift: harmonized with waveform and OMOP

Appendix A - List of recommended DICOM metadata headers to merge into OMOP
Study Level:
AccessionNumber
StudyDescription
StudyInstanceUID
Modality
StudyDate
Manufacturer
ManufacturersModelName
StudyTime
MagneticFieldStrength
BodyPartExamined
Radiopharmaceutical
ContrastBolusAgent
ContrastBolusRoute

Series Level:
SeriesDescription
SeriesInstanceUID
SliceThickness
ViewPosition
ImageLaterality
ImagesinAcquisition
TransducerType
TransducerFrequency
SeriesNumber

Image Level: Ignore for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

5 participants