Minutes Data Working Group 17 Jun 2021

Agenda

Issues tagged with Data WG Label
1. Issue 857: Experiments Proposal: Customization, Reproducibility and Extensibility - Feedback Welcomed
Discussions tagged with Data WG Label
1. Discussion 2110: Covid19 segmentation
2. Discussion 2079: Error with PersistentDataset in pytorch distributed setting
Recent discussions from MONAI Advisory committee
1. Discussion 2212: Proposal feedback wanted: CSV files to ingest image and patient data from TCIA
2. Cardiology-specific needs (relates to call for society involvement)
3. Pathology working group under way
4. Real-time video
Development updates relating to Data WG

Notes

Discussion 2212: Proposal feedback wanted: CSV files to ingest image and patient data from TCIA
- Work is underway right now planning how CSV usage is to be implemented, and structure is of high priority
- Decision taken to use CSV to make adoption easier now, rather than standardize a structure on a formal ontology (like FHIR or DICOM)
- Question is, how do we map the fields from CSV to MONAI?
  - Currently, a CSV column name is mapped into an object in code which can be then called (e.g., "image" and "image_full" can be retrieved)
- CSV poses a problem because there is no connection from a column in a CSV file to a real-world data representation; there also lacks any capability of strong-typing or data validation
- Recommend considering a strongly-typed “business mapping file” that goes along side the CSV file, with a proper definition of the columns in CSV
  - E.g., see this example : https://www.w3.org/2013/csvw/wiki/CSV-LD
  - It should describe the field type, and also the ontology / best reference; e.g., reference the source of data as a DICOM tag, an HL7 field, a SNOMED or LOINC value, etc.
  - This permits data validation; e.g., if the CSV references files that don’t exist, MONAI should crash gracefully when loading images.
  - Data import from CSV should run some form of validation (e.g., checking the value matches its strongly typed type; a URL, a string, a number, a file reference) and it can do validation checks (e.g., is it a string? does the filke exist? can I retrieve the URL?)
- Sources of data in a CSV file can be problematic; e.g., a DICOM reference (study UID) with IP address, port number, AE title repeated 75,000 times can be tedious
  - It’s not enough if this is stored in a config file somewhere; e.g., how do you specify the different locations of DICOM files (e.g., the DIMSE endpoint IP/Port/AE, or the DICOM web endpoint?) and what happens when it changes (replicated to a new site, and the endpoint changes) , or when data is in 2 different locations (e.g., a current PACS and a VNA for long term archive, or a rad PACS and a cardio PACS, etc).
  - A “business mapping file” (described above) perhaps can also provide tips on where to retrieve DICOM from
- How does multiple records of the same patient be “linked” in CSV? E.g., if you look at two images in a "subject" in a CSV file? How do joining and merging would work?
  - E.g. check out https://tadpole.grand-challenge.org/Data/ as an example
  - This dataset has 12+ CSV files that are merged and linked somehow; there must be primary / foreign keys described in CSV
  - This can become unwieldly to maintain as they get more complicated
  - At this stage, MONAI loading CSVs would need to be pre-processed; if there are multiple CSVs they would need to be merged down / pre-processed down to 1 CSV file that would be loaded to MONAI
Issue 857: Experiments Proposal: Customization, Reproducibility and Extensibility - Feedback Welcomed
- WG should review offline and add comments perhaps?
Discussion 2110: Covid19 segmentation
- This is coming together, and will be released in the future, waiting for publication and discussion with collaborators
- One option - AWS open data might be a good way to work on this
- Format needs to be defined for usage
Discussion 2079: Error with PersistentDataset in pytorch distributed setting
- This issue is fixed, see issue 2086
- Open question remains, should caching dataset be done for re-use? Internal representations are not exchanged
Other Business
- Evaluation WG is reaching out to MICCAI organizers, to see if some samples of training / testing data, with the types of data / structures of data, can be shared
  - Desire is to determine whether any preprocessing of data needed?
  - Maybe only a subset of data just to see what it looks like; e.g. handling multiple labels.

Copyright (c) MONAI Consortium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes Data Working Group 17 Jun 2021

Agenda

Notes

Working Groups

Clone this wiki locally