A Python toolkit for parsing and analyzing metadata from DICOM files.
This project utilizes the pydicom
and fastcore
libraries. It borrows ideas (and some code) from the fastai.medical.imaging
library (source).
The metadata preprocessing and series selection algorithm are recreated from the paper by Gauriau et al. (reference below), in which a Random Forest classifier is trained to predict the sequence type (e.g. T1, T2, FLAIR, ...) of series of images from brain MRI. Such a tool may be used to select the appropriate series of images for input into a machine learning pipeline.
Reference: Gauriau R, et al. Using DICOM Metadata for Radiological Image Series Categorization: a Feasibility Study on Large Clinical Brain MRI Datasets. Journal of Digital Imaging. 2020 Jan; 33:747–762. (link to paper)
git clone
the repositorycd
into the repopip install .
(include the-e
flag for an editable install)
Read a DICOM file:
from pydicom.data import get_testdata_file
path = Path(get_testdata_file("MR_truncated.dcm"))
ds = path.dcmread()
ds.file_meta
(0002, 0000) File Meta Information Group Length UL: 190
(0002, 0001) File Meta Information Version OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI: MR Image Storage
(0002, 0003) Media Storage SOP Instance UID UI: 1.3.6.1.4.1.5962.1.1.4.1.1.20040826185059.5457
(0002, 0010) Transfer Syntax UID UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID UI: 1.3.6.1.4.1.5962.2
(0002, 0013) Implementation Version Name SH: 'DCTOOL100'
(0002, 0016) Source Application Entity Title AE: 'CLUNIE1'
Import a select subset of DICOM metadata into a pandas.DataFrame
. The subset is defined in dicomtools.core
and is based on the metadata used for the series selection algorithm in the paper referenced above.
df = pd.DataFrame.from_dicoms([path]).drop('fname', axis=1)
df.T
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
0 | |
---|---|
ImageType | [DERIVED, SECONDARY, OTHER] |
SOPClassUID | MR Image Storage |
PatientID | 4MR1 |
ContrastBolusAgent | |
ScanningSequence | SE |
SequenceVariant | NONE |
ScanOptions | |
MRAcquisitionType | 3D |
SliceThickness | 0.8 |
RepetitionTime | 4000 |
EchoTime | 240 |
EchoTrainLength | None |
StudyInstanceUID | 1.3.6.1.4.1.5962.1.2.4.20040826185059.5457 |
SeriesInstanceUID | 1.3.6.1.4.1.5962.1.3.4.1.20040826185059.5457 |
StudyID | 4MR1 |
SeriesNumber | 1 |
AcquisitionNumber | 0 |
InstanceNumber | 1 |
ImageOrientationPatient | [1.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000] |
PhotometricInterpretation | MONOCHROME2 |
PixelSpacing | [0.3125, 0.3125] |
class
Finder
[source]
Finder
(path
)
A class for finding DICOM files of a specified sequence type from a specific .
Finder.predict
[source]
Finder.predict
()
Obtains predictions from the model specified in model_path
Finder.find
[source]
Finder.find
(plane
='ax'
,seq
='t1'
,contrast
=True
,thresh
=0.8
, **kwargs
)
Returns a pandas.DataFrame
with predicted sequences matching the query at the specified threshold