You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attemting to match BUG! with the following regex causes the named group <bug> to store the value of the numbered group \1.
(?|(?P<bug>xxx)(!)
|(?P<bug>BUG)(!)
)
=> <bug> will contain ! instead of BUG
A second bug can be seen when we change the regex to not redefine <bug>; then, the numbered group \1 gets dropped alltogether.
A file with test cases are attached.
Mixing named and numbered groups is probably not done intentionally by most people (although i guess there could be applications for it), but by accidentally omitting the non-capturing (?:) boilerplate.
The text was updated successfully, but these errors were encountered:
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).
That's an interesting issue, but is it a bug?
The rule is that groups are numbered consecutively 1, 2, 3, etc, but named groups have the same group number.
The problem here is that the branch reset is restarting the numbering and it's not skipping over group numbers that have already been used in that branch. The question is whether it should.
To give another example, if the second branch was (BUG)(?<bug>!), then, according to the rule, (BUG) would be group 1 because it's the first group in the branch and (?<bug>!) would also be group 1 because it's a named group that’s already defined as group 1.
I'll need to see how other implementations handle the question.
Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).
re: Compiled patterns have a groupindex attribute that maps group names to group numbers, so a group name can be linked to only 1 group number. This must also be true of regex in order to maintain compatibility. Branch reset is not supported..
Perl: Both branches of a branch reset number consecutively, and a group name can be linked to more than 1 group number. This behaviour is incompatible with re, and therefore regex.
PCRE2: Both branches of a branch reset will number consecutively, but group number can be linked to only 1 group name, so a pattern such as (?|(?<AA>aa)|(?<BB>bb)) will cause an error.
C#: Unnamed groups are numbered first, followed by named groups. This behaviour different from re, and therefore regex. Branch reset is not supported.
So, what should the rule be? How should group numbers be assigned in the examples below?
Original report by Anonymous.
Attemting to match
BUG!
with the following regex causes the named group<bug>
to store the value of the numbered group\1
.=>
<bug>
will contain!
instead ofBUG
A second bug can be seen when we change the regex to not redefine
<bug>
; then, the numbered group\1
gets dropped alltogether.A file with test cases are attached.
Mixing named and numbered groups is probably not done intentionally by most people (although i guess there could be applications for it), but by accidentally omitting the non-capturing
(?:)
boilerplate.The text was updated successfully, but these errors were encountered: