-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON and BIDS-Prov #146
Comments
BTW, this would also be somewhat aligned with the effort of @surchs et al in https://github.com/OpenNeuroDatasets-JSONLD/ where instead of aiming at directly equipping openneuro BIDS datasets with .jsonld files "directly", they work on improving description of phenotypic data within JSON files so then there could be improved (over original BIDS) .jsonld files "derived" from the JSON files metadata. In other words: they are concentrating on the description of metadata in more concise/to the point way, relying on
edit 1: IMHO think this issue is one of the most important aspects to facilitate adoption. |
@yarikoptic -- may I ask if you can help me keeping one issue per issue otherwise it's hard for me to keep track 🙏
"Justification for Separating Provenance from file JSON" is no longer present in the BIDS-Prov spec at https://bids.neuroimaging.io/bep028. The discussion on how to fit with current BIDS "wasGeneratedBy" is in #148.
If I understand well this is very much related to #147, let's discuss over there? neurobagel is awesome but those tools did require a large amount of effort. For BIDS-Prov, I think we need to balance the complexity of the standard with the complexity of the tooling required to read the files...
Yes, this is something I hope we can aim for in the current version of the BIDS-Prov standard. Note that amongst all the possible way to serialize JSON-LD graphs, the spec focused on two specific ways see "BIDS-Prov JSON-LD file" and "Alternative representation for file-level provenance JSON-LD". --> #149
See "Examples" in the spec, I think we have a good set of examples... In particular the SPM ones (AFNI and FSL will require more work): https://github.com/bids-standard/BEP028_BIDSprov/tree/master/examples/from_parsers/spm Note that "BIDS-Prov is [...] limited to the capture of data processing, future considerations including other types of provenance are listed in section "Future perspectives" so I think "additional manual annotation" may be out of scope (depending on what this means sorry if I misunderstood). We'll focus on a DICOM to Nifti conversion example when we can with @bclenet --> #150 So I'll close this issue and we can continue discussing the various questions in the dedicated issues. @yarikoptic: if there is a separate point we need to discuss here, let me know and we re-open another specific issue. |
This might still be a separate issue overall of having a PROV record in .json sidecars/dataset_description.json whenever
BTW - "reason" for allowing the record within .json file: it would not be appreciated by many (HPC or not; inodes limits or "slower git checkout") if for every data file there would still be another .jsonld file. Absorbing all of them into a single file (per dataset or other level) would complicate locating corresponding PROV record for a file, require tooling. Hence I feel that allowing for concise "high level" description ("pipeline/workflow level summary") in the JSON sidecar would be very important to be allowed. |
I will reopen this issue since as I stated above I think it best describes the specific aspect of allowing PROV record within a regular sidecar .json file. |
those are nice indeed! But they aim for BIDS derivative datasets. There, indeed, might be worth making tools to just dump a big .jsonld per each subject/session or above and "be done" without fears to abuse inodes on the cluster, or that users would need to "tune" them later. But if we start talking about "raw" BIDS datasets, in my experience, even with automations like heudiconv etc, there some times A LOT of curation going on to make them proper. Some times with tools which might also like to add their PROV records.
where is that section? I failed to |
The main focus of BIDS-Prov is indeed derived datasets. We'll have a look with @bclenet on your proposal to have an example of DICOM to nifty conversion (see #150) but let's see how feasible this is / how much we need to tweak the model for that
The spec is in the google doc available at: https://bids.neuroimaging.io/bep028 :) About #146 (comment) To me the discussion about json and wasGeneratedBy is already in #151, can we use that issue instead of the current one (that overlaps many ideas?) |
#151 is about "descriptions". Did you mean IMHO those two are largely independent of this one, as they could potentially be solved by direct conversion-into or integration-with |
Update proposal for BIDS Prov (BEP028)
By @yarikoptic in #125 (comment)
generatedBy
to be specified in the corresponding.json
file, potentially with further relaxations such as not demandingid
(assume to be unique) and overall have a clear schema which we could encode in our BIDS schema and validateThe text was updated successfully, but these errors were encountered: