-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Allow for - (dash) (and/or +) in <label> #1165
Comments
I'm personally okay with |
I'm okay with |
Great point I haven't thought to exercise @oesteban ! -- I would not allow it as well, makes parsing much more difficult etc! (git)hopa:~/proj/bids/bids-specification[derivatives]git
$> git grep '_<label'
src/04-modality-specific-files/04-intracranial-electroencephalography.md:called `electrical_stimulation_<label>`, with labels chosen by the researcher and so might need analysis/tune up (or not) |
ok, after #152 is merged in one way or another we could have a PR extending the list of allowed characters non-ambiguously in a single location ;-) |
Note that the request to include hyphens in key-value labels is currently in the "Suggestions for BIDS 2.0" Google doc: Allowing hyphens in filename key values which suggests, as I had recalled, that this was a settled issue. Including hyphens increases the complexity of parsing the filenames, was the reason I recall. |
@nicholst, will '-' really increase complexity of parsing? Each key-value parsed by '_' and everything after the first '-' is a label. |
@nicholst Oh, that is where BIDS 2.0 "extensions" are discussed -- thank you for the link! I have added the link to the 2.0.0 milestone description. I think it might be worthwhile to move them all under issues over here at some point, so elderly folks like us could easily find everything relevant in the single location (here). I would not say that parsing complexity would increase, but indeed it might require some adjustment. |
Just reciting the conclusion of a long thread about the issue. Like anything, this issue can be re-raised, but I think there is the basic issue of extra clarity from having the hyphen play exactly one role in file/directory names. |
my 1c: This issue by now is probably a few years old (IIRC I have suggested it possibly even during BIDS 1.x initial development/release) . If we acted sooner, it would be no issue but even those years back I have proposed it the main argument against was "we would break the tools" of which there were much fewer. So longer we wait to act, the harder would be to adopt this change. |
I think at this point the procedure would be sending a draft PR with the suggested changes for a more focused discussion, is that correct @franklin-feingold @sappelhoff ? |
Yes, and on top of that I think what this discussion needs are:
Currently this change seems to me as a minor improvement with the potential for a major confusion ... but I am very happy to be convinced otherwise :-) |
Use cases:
Subject or session labels come from long-established project that has
identifiers with dashes.
Related, such a project uses identifiers with underscores (but not dashes),
and the underscores need to be replaced; having dashes as an option for
replacing is helpful.
|
Is UTF8 allowed in BIDS file names, or if not, should it be? If so, there are many dashes to choose from which are not a minus (ASCII decimal 45). For example ‒ – and —. See https://www.w3schools.com/charsets/ref_utf_punctuation.asp |
Using alternatives that look like the proposed addition doesn't help much with readability, makes it harder. Imagine a suffix like I would be fine with the |
I think we should explicitly forbid UTF8 and limit to ASCII to avoid such "tricks" which could potentially cause confusion. |
Agreed about limiting character sets, I also don’t look forward to multiple types of dashes or emojis in filenames. But the limit should be explicitly mentioned.
Regarding the “+” I am neutral.
Verstuurd vanaf mijn iPhone
… Op 2 nov. 2019 om 00:57 heeft Yaroslav Halchenko ***@***.***> het volgende geschreven:
I think we should explicitly forbid UTF8 and limit to ASCII to avoid such "tricks" which could potentially cause confusion.
So let's move forward with "+"?
I doubt any tool was so strict with regexp that it listed only alpha numerics. and sooner we change, less possible negative effect it would have. This issue already dragged long time (I originally suggested in the times of BIDS being a google doc), and I think it is worth to be accepted to 1.x series without awaiting 2.0 which I am not quite sure would come in any foreseeable future
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
ok, it seems that the path of the least resistance would be to add + as allowed character. I will look into proposing a PR here and against bids-validator for that (unless someone would beat me to it which I would really appreciate) |
oh hoh, it had been a year! Few final remarks and then I will try to find time to compose a PR.
Since I and others mentioned a number of use cases, and IMHO it would be easier to strip away than to add later, I will prepare a PR which would introduce allowing both |
woohoo -- I found you my darling issue -- you have been moved and I thought that I just lost my "search foo" skill! With the addition of BIDS URIs (#918) into BIDS specification (aiming for 1.8.0 release) I really want to make a case for this issue to be given yet another consideration:
Now I will really make an effort to file a PR but might need to prep few "preparatory" PRs first. |
I moved this back. Sorry, not sure exactly why it got moved; perhaps it seemed abandoned, or was lumped in with the BIDS-2 proposal linked above because of the Reading through the thread, I think there's basically:
So I think a reasonable proposal is to change the regex for label to |
While I have no objection to The genealogy of this request is also indicative of problems which BIDS could help solve by encouraging better data practices. This |
and thus we don't use periods in the middle of the filenames (although could), but we do use Although I agree that quite often desire to place Another analogy -- BIDS doesn't restrict the length of the |
Are we not talking about the 20% in the 80/20 principle? How many datasets would need such flexibility to specify I tend to agree with Chris' assessment, and, as I mentioned before, once |
I agree with the usefulness of |
No update. Rereading this, I think my assessment remains the same as before: We can do this. Somebody needs to make a concrete proposal in the form of a PR. I have some new questions that I would like to see addressed in a proposal (possibly they came up and I skimmed too briefly):
I would suggest the following answers, but I'm not 100% set:
|
Thanks @effigies , I'm happy to help on the PR |
Are you @effigies to introduce some new semantic behind milti-label - Ie would have some meaning beyond just being a label? |
Hmm. Not intentionally (see 5). Possibly that argues against my position 1. I do think the end result should be logically consistent, however these questions are answered. |
Ambitious hope. I think the lack of minus support is usually the point where people are introduced to metadata separation. I'm still explaining to one of our colleagues why the name of his mouse is not part of the session name. I continue to think this makes BIDS less conducive to good metadata practices. If you really really want to allow something like this, maybe it should be made sufficiently prohibitive so as to give the user pause as to whether it's worth it. We could support some other character. Perhaps the cdot, The prohibitiveness of |
@effigies could you confirm that here #1165 (comment) you are referring to only Another weak idea regarding 3 (and trying to get regexes even uglier) is: should two Overall, I totally agree with @effigies' assessment, but also agree with @yarikoptic on why this should get added semantics (e.g., allow for subject labels). |
Yes, just |
Given the absent concrete action (from me) on this, I am thinking that we should bundle this one into BIDS 2.0. Hence I will announce this closed and let's continue on |
The BIDS maintainers have discussed this issue and I think we've converged on allowing |
FWIW, as #1926 shows, change is "substantial" enough to also allow for |
I thought that we already had an issue filed for this, but may be it was just my comments in the google doc version of the BIDS specification, so filing anew. Please close it referencing original one if one already exists.
ATM
<label>
is as I formalized in yet to be finished PR is allowed to contain only alpha numerics. That makes it really difficult in some cases to place multiple entries (e.g. details for_acq-
or abbreviate details of preprocessing in_desc-
of BIDS derivatives). In ReproIn heuristic for heudiconv, where we do allow for-
we just replace them with an uglyX
as a separator to just stay compatible with BIDS.I have been suggesting to extend the list of allowed characters in
<label>
to include at least-
and possibly others (e.g.,+
) which otherwise should not introduce any backward incompatibility with BIDS 1.0 -- all BIDS 1.0 names would be compatible with new syntax allowing-
in the<labels>
.The original concern (probably at least a year back) which precluded any action was a possible breakage of parsing BIDS filenames for BIDS-supporting software, and was suggested to be included in coming some time in the future BIDS 2.0.
N.B. I have failed to file and issue outlining plans for BIDS 2.0, so I initiated a new milestone (2.0.0) to which I am adding this new issue.
IMHO (and may be we should outline it somewhere) BIDS supporting applications should be coded adhering to Postel's principle which is driving TCP implementations to "be conservative in what you do, be liberal in what you accept from others.". And most likely, in the case of added
-
, they already are since they are more likely to be splitting based on_
and then on-
to separatekey
fromvalue
instead of full rigid regular expressions limiting to the allowed characters (not so obviously) mandated by the BIDS specification. So the only catch here indeed might be splitting on-
without restricting that only a single split should happen. That is where breakage could happen if-
is added. The possibility of breakage in case of extending the list of allowed characters with+
(or even:
which I dislike to suggest due to messing up scp/ssh) unlikely to have negative ramifications.Please share your thoughts/opinions
The text was updated successfully, but these errors were encountered: