Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom prefix (X-) for arbitrary metadata in .json files #80

Open
TheChymera opened this issue Aug 13, 2024 · 10 comments
Open

Custom prefix (X-) for arbitrary metadata in .json files #80

TheChymera opened this issue Aug 13, 2024 · 10 comments
Labels
consistency Aspect requiring special treatment/logic outside of generic common principles metadata Changes to metadata fields/files.

Comments

@TheChymera
Copy link

TheChymera commented Aug 13, 2024

Name space collisions came up today in the stimulus BEP discussion ( @yarikoptic ). Perhaps a prefix for arbitrary table columns would help with that, i.e. if we introduce a new column name, to make sure that that isn't misinterpreted with respect to preexisting data.

Not sure if this should be done with entities as well (since we don't support arbitrary entities, even though I think we should)...

In any case if it's just for column names we could use underscore as a prefix for arbitrary column names.
If we want to do the same for entities it would need to be somthing uglier like XYZ or similar.

@yarikoptic yarikoptic changed the title Custom prefix for arbitrary metada Custom prefix for arbitrary metadata Aug 13, 2024
@yarikoptic
Copy link
Contributor

yarikoptic commented Aug 13, 2024

I feel like I have proposed to add X- prefix to any metadata field (and possibly column in TSV) for any non-formalized by BIDS field. This is based on how email RFC ??? does it -- you can add arbitrary header field but prefixed with X-. So we have tons of them in the email headers now (e.g. X-Spam-Level, x-microsoft-spam) which allows for safe vendoring.

But it didn't gain traction at that point. I fail to find ATM, but I think I also ran it by @rordenlab in the scope of dcm2niix... might find it later unless someone beats me to it.

Good news: since we formalized machine readable schema, it would be quite simple to establish automated migration as part of BIDS 2.0 - 3.0 migration helper.

Pros:

  • future proofing, smoother handling of BIDS versions: avoids possible collision. May be even name would stay the same but format would change and it would break otherwise perfectly valid BIDS dataset of prior version of BIDS
  • preventing typos: an unknown field could be a typo, to be fixed while composing those .json files
    • validator could then even suggest close hits among those which are known
  • saving brain cycles: seeing a field "Blah" ATM requires people to search for "Blah" in bids-specification to just fail to find and then trying to figure out what it means, googling up to see may be someone describing it etc
  • facilitating formalizing new metadata entries: ATM people would just stick them in without bothering filing an issue, or even searching for one -- thus potentially just growing the multiverse of non-formalized, but useful, metadata fields
    • migration script can also migrate from x- form in many (if not most) cases whenever a specific entry gets accepted into official schema

@yarikoptic yarikoptic moved this to Todo in BIDS 2.0 Aug 31, 2024
@yarikoptic
Copy link
Contributor

Another similar case IIRC it is mime types , see https://en.wikipedia.org/wiki/Media_type#Unregistered_tree . There x- I saw used , but I now see that x. is "advised one" there. But the point is the same - having a prefix.

@yarikoptic yarikoptic added flexibility consistency Aspect requiring special treatment/logic outside of generic common principles and removed flexibility labels Sep 20, 2024
@yarikoptic
Copy link
Contributor

Added consistency label as only with X- prefix there is consistency that metadata field is actually known to BIDS. Without that there is inconsistent mix of metadata fields known and unknown to BIDS so it is already "flexible" like that. We will retain flexibility of being able to add new fields but in a consistent fashion. If field is perspective for some BEP, could be X-BEP??? to simplify catching/migrating whenever BEP is accepted, or just carry target name and explicitly ignored etc.

I see that currently all officially allowed metadata fields have no - or _ in them.
In general, does schema or tooling allows (as proceeds without complains) for - in the metadata field names? (attn @effigies)

@neurolabusc
Copy link

neurolabusc commented Oct 25, 2024

Happy to create a separate pull request, but it might be nice to get initial guidance here. This discussion is timely, as seen by dcm2niix issues 881 and 880. From inception, dcm2niix included many keys not formally defined by BIDS. Before BIDS 1.0, @chrisgorgo suggested unilaterally defining keys for fields that did not exist in BIDS. Over time, many contributors have suggested new keys. However, this has caused some issues, as some BEPS choose different key names (in particular ASL and PET). Now that the BIDS specification has a clear mechanism for defining new keys and a democratically selected steering committee, it seems useful to try to develop a consensus on key names before unilaterally introducing them in one tool.

@effigies noted If anything, it would be good to propose defining all of the current fields dcm2niix emits in BIDS.I would advocate for adopting them more-or-less wholesale.

I would be interested in getting feedback from visionaries on this topic (@CPernet , @Remi-Gau, @xiangruili, and @satra come to mind). My proposal would be:

  1. Get consensus existing dcm2niix key names, decide if all can be considered optional.
  2. Develop a validation dataset that includes DICOM files that exhibit all the proposed keys similar to dcm_qa. This will provide concrete exemplars for other tool builders to support their (optional) adoption of these keys.
  3. Create a PR for the BIDS specification.
Below (click to expand) is a list of the proposed new keys, all of which dcm2niix already generates:

Global Constants

These fields should be the same for all images acquired on a specific scanner.

Field Unit Comments
ConversionSoftware e.g. dcm2niix
ConversionSoftwareVersion e.g. v1.0.20210317

We should have a consensus of whether software and software version are two keys or concatenated as one, e.g. dcm2niix creates two keys, while dicm2nii generates one: "ConversionSoftware": "dicm2nii.m 20240821".

Global Series Information

These fields are present regardless of modality (e.g. MR, CT, PET).

Field Unit Comments
PatientPosition DICOM tag 0020,0032
ProcedureStepDescription DICOM tag 0040,0254
StudyDescription DICOM tag 0008,1030
SeriesDescription DICOM tag 0008,103E
ProtocolName DICOM tag 0018,1030
ImageType list DICOM tag 0008,0008 see dcm2niix issue 881
AcquisitionTime DICOM tag 0008,0032
AcquisitionNumber DICOM tag 0020,0012
ImageComments DICOM tag 0020,4000

Global Private Information

These fields contain personally identifiable information. By default dcm2niix will create anonymized files without these fields. The option -ba n will retain private information.

Field Unit Comments
SeriesInstanceUID DICOM tag 0020,000E
StudyInstanceUID DICOM tag 0020,000D
ReferringPhysicianName DICOM tag 0008,0090
StudyID DICOM tag 0020,0010
PatientName DICOM tag 0010,0010
PatientID DICOM tag 0010,0020
AccessionNumber DICOM tag 0008,0050
PatientBirthDate DICOM tag 0010,0030
PatientSex DICOM tag 0010,0040
PatientAge DICOM tag 0010,1010
PatientSize DICOM tag 0010,1020
PatientWeight kg DICOM tag 0010,1030
AcquisitionDateTime DICOM tag 0008,002A

Modality Fields

These fields are specific to modality (e.g. MR, CT, PET).

Modality Computerized Tomography

Fields specific to CT scans.

Field Unit Comments
ExposureTime s DICOM tag 0018,1150
XRayTubeCurrent mA DICOM tag 0018,1151
XRayExposure mAs DICOM tag 0018,1152

Modality Magnetic Resonance Imaging

Fields specific to MRI scans.

Field Unit Comments
AcquisitionMatrixPE DICOM tag 0018,9231 (aka PhaseEncodingLines)
DerivedVendorReportedEchoSpacing s
EchoNumber Only multi-echo series
EchoTrainLength DICOM tag 0018,0091
EffectiveEchoSpacing s
EstimatedEffectiveEchoSpacing s
EstimatedTotalReadoutTime s
VariableFlipAngleFlag b DICOM tag 0018,1315
ImageOrientationPatientDICOM DICOM tag 0020,0037
ImagingFrequency MHz DICOM tag 0018,0084 or 0018,9098
InPlanePhaseEncodingDirectionDICOM DICOM tag 0018,1312
NumberOfAverages DICOM tag 0018,0083
PercentPhaseFOV DICOM tag 0018,0094
PercentSampling DICOM tag 0018,0093
FrequencyEncodingSteps DICOM tag 0018,9058
PhaseEncodingSteps DICOM tag 0018,0089 or 0018,9231 aka PhaseEncodingStepsInPlane
PhaseEncodingStepsOutOfPlane DICOM tag 0018,9232
PixelBandwidth Hz DICOM tag 0018,0095
RepetitionTimeInversion s
SAR DICOM tag 0018,1316 or 0018,9181 defined by 0018,9179
SliceThickness mm nb
SpacingBetweenSlices mm nb

Manufacturer Fields

Manufacturer General Electric

Field Unit Comments
PulseSequenceName epi or epiRT
InternalPulseSequenceName EPI or EPI2
PhaseEncodingPolarityGE Unflipped or Flipped
NumberOfPointsPerArm DICOM tag 0027,1060
NumberOfArms DICOM tag 0027,1061
NumberOfExcitations DICOM tag 0027,1062
ShimSetting[0] DICOM tag 0043,1002
ShimSetting[1] DICOM tag 0043,1003
ShimSetting[2] DICOM tag 0043,1004
PrescanReuseString DICOM tag 0043,1095
ASLContrastTechnique DICOM tag 0043,10A3
ASLLabelingTechnique DICOM tag 0043,10A4
CompressedSensingFactor DICOM tag 0043,10B7
DeepLearningFactor DICOM tag 0043,10CA

Manufacturer Philips

Field Unit Comments
TriggerDelayTime s DICOM tag 0020,9153 or 0018,1060
PhilipsRWVSlope DICOM tag 0040,9225
PhilipsRWVIntercept DICOM tag 0040,9224
PhilipsRescaleSlope DICOM tag 0028,1053
PhilipsRescaleIntercept DICOM tag 0028,1052
UsePhilipsFloatNotDisplayScaling dcm2niix option -p y or -p n
PartialFourierEnabled DICOM tag 0018,9081, YES
PhaseEncodingStepsNoPartialFourier DICOM tag 0018,9231
WaterFatShift DICOM tag 2001,1022
PhilipsScaleSlope DICOM tag 2005,100E

Manufacturer Siemens Magnetic Resonance Imaging (V*)

Field Unit Comments
Interpolation2D If present, slices interpolated within plane
Interpolation3D If present, image interpolated in all spatial dimensions
BaseResolution integer # of acquisition lines?
ShimSetting DAC for 1st order terms, mA for 2nd order terms The 1st 3 values are the 1st order shims from sGRADSPEC.lOffset{X,Y,Z}. The next 5 are 2nd order shims from sGRADSPEC.alShimCurrent[0-4]. Converting to the first order terms to uT/m may be possible using the "gradient sensitivities". Converting the second order terms to uT/m^2 requires information not stored in the DICOM header, but they are fixed for a given shim hardware configuration.
DiffusionScheme Monopolar or Bipolar
DelayTime s Pause between EPI volumes, where TR is longer than required by TA (sparse imaging)
TxRefAmp V
PhaseResolution f
PhaseOversampling
CoilString May or may not match ReceiveCoilName
FmriExternalInfo
WipMemBlock
AveragesDouble CSA dAveragesDouble, fractions possible, independent of DICOM NumberOfAverages (0018,0083)
ProtocolName Check SeriesDescription - they might be switched around
RefLinesPE # of reference lines in the phase encoding direction for acceleration (GRAPPA)
ConsistencyInfo The more complete software version, e.g. VE11C or VE11E instead of just VE11.
BandwidthPerPixelPhaseEncode Hz DICOM tag 0019,1028
ImageOrientationText DICOM tag 0051,100E

Manufacturer Siemens Magnetic Resonance Imaging (XA)

Field Unit Comments
BandwidthPerPixelPhaseEncode Hz DICOM tag 0021,1153
ScanningSequence DICOM tag 0021,105a

@yarikoptic
Copy link
Contributor

Thank you @neurolabusc ! You are hitting many nails on the head with the desire to formalize all those dcm2niix already populates.
It is indeed worth filing a separate issue or even IMHO better a PR (easier to discuss any particular one in conversation attached to specific line in diff) to propose those even at the level of current BIDS 1.x for most of them. I have also filed a pair of related issues on ConversionSoftware* fields - IMHO they should be made more generic.

NB I have wrapped the listing of individual keys into <detail> construct in your comment to keep discussion on the original issue easier to follow here.

@yarikoptic yarikoptic changed the title Custom prefix for arbitrary metadata Custom prefix (X-) for arbitrary metadata Oct 25, 2024
@yarikoptic yarikoptic changed the title Custom prefix (X-) for arbitrary metadata Custom prefix (X-) for arbitrary metadata in .json files Oct 25, 2024
@yarikoptic yarikoptic added the metadata Changes to metadata fields/files. label Oct 25, 2024
@mharms
Copy link

mharms commented Oct 25, 2024

As I commented here (rordenlab/dcm2niix#880 (comment)), I think it is important that dcm2niix stays responsive to user suggestions for additions to its json metadata, without those individuals needing to formally engage with BIDS.

Also, dcm2niix is its own stand-alone software. I don't see why new fields added to its json should be forced to have some odd, arbitrary prefix just because those fields don't happen to be formally defined in the BIDS spec.

@yarikoptic
Copy link
Contributor

yarikoptic commented Oct 25, 2024

It is just a matter if dcm2niix would like to produce "BIDS .json sidecar files" (so understood by BIDS-aware tools etc) or ".json sidecar files"? If there is a desire to produce "BIDS standardized" sidecar files, sure thing there should be a engagement with the standard.

Not unlike is happening in DICOM itself: manufacturers stick tons of custom metadata in vendor specific sections of DICOMs (I guess everyone of dcm2niix developers know how much fun it is to dig through those), and eventually DICOM standardizes some new metadata properties, and manufacturers (slowly) adopt them to facilitate inter-operation with DICOM receiving hardware/software.

So here is your response with some minor (dcm2niix -> Siemens and json/BIDS -> DICOM) changes and rewordings. Do you still agree with "your own" statement?

I think it is important that Siemens stays responsive to user suggestions for additions to its DICOM metadata, without those individuals needing to formally engage with DICOM.

Also, Siemens is its own stand-alone manufacturer. I don't see why new fields added to its DICOM should be forced to be placed into some odd, arbitrary section just because those fields don't happen to be formally defined in the DICOM.

@mharms
Copy link

mharms commented Oct 25, 2024

I don't see the parallel to Siemens/DICOM. Siemens can (and does) put whatever they want into their own vendor specific DICOM sections.

A more appropriate analogy to what you are proposing as far as I'm concerned is that Siemens should append some common prefix to all the entries in their vendor specific sections, because they aren't formally defined DICOM fields. When you extend the analogy in that fashion, I obviously don't agree with that.

@yarikoptic
Copy link
Contributor

Analogy is exactly that -- here I am proposing to add a "vendor specific" (or arbitrary adhoc) prefix. It then could as well be x-dcm2niix "vendor specific" field (section) which would contain a dictionary of all extra fields dcm2niix adds, or just a flattened list of x- fields dcm2niix decides to add, while at the top level all fields without x- prefix are the ones (and only those) defined in BIDS.

@CPernet
Copy link

CPernet commented Nov 13, 2024

RE: Get consensus on existing dcm2niix key names, and decide if all can be considered optional.

Any non-documented field can be introduced at any time. If you browse openneuro, you will see many MRI data already have many of the fields you listed (because people use dcm2niix), and the validator skipped over them. So what you are asking is if we should add new optional fields in the spec. - but this is spec motivated not software related (even if in practice, many, most? conversions rely on your tool) - because any software specific things can always be added.

I'd personally have Global Constants as recommended.
As for what should be added to the spec, I have no idea why some of the field you propose are even not mandatory tbh (echo for instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Aspect requiring special treatment/logic outside of generic common principles metadata Changes to metadata fields/files.
Projects
Status: Todo
Development

No branches or pull requests

5 participants