-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qt-LGPL-exception-1.1: Add new ID for Nokia-Qt-exception-1.1 #627
Conversation
da550aa
to
01e4746
Compare
@goneall, is the test failure because of a space/ |
I agree. And if memory serves me right, (a form of) the exception predates Nokia’s acquisition of Qt as well. Qt got passed on as such: Trolltech ↦ Nokia ↦ Digia↦ Qt Company – which further strengthens the suggestion that a more generic name should be used. There is also the Free Qt Foundation, but that’s not the primary concern here :) |
It turns out there are 2 problems causing the tests to fail. Having a null match in the alt match The second problem is including the trailing space in the matching text. Since the matching software tokenizes and removes white space, it doesn't include the space in the match. To work around this, just don't include any preceding or trailing white space in the match. Replacing |
Is this something we can fix in the parser? We already have some of these in master (e.g. here).
Including a space in the inner text but not in the
with the trailing space outside the |
I could not come up with a good general fix (I'll describe the problem in more detail below). I did find another possible work-around. If you order the match so that the empty match is at the end (e.g. The specific issue is the algorithm used to the the alt matching:
The problem is number 5 above - if an empty choice is matched (and it will always match), the cursor is moved forward by zero tokens (e.g. it doesn't move), this leaves the cursor at its current position and doesn't match the variable text we want it to. The match then fails when it applies the next matching rule. I can not think of a general design that would avoid this situation. |
We've had these in this repository since the exception landed in e355521 (Adding exception: Nokia-Qt-exception-1.1, 2017-02-05, spdx#354), and they've been in license-list-data since the exception landed there in spdx/license-list-data@cec28dc7 (Updated JSON exception files, 2016-04-16). I haven't bothered with an <alt> tag because the matching guidelines have [1]: 5.1.3 Guideline: Quotes Any variation of quotations (single, double, curly, etc.) should be considered equivalent. That doesn't explicitly say that you can add/remove quotes in the general case. And maybe that's intentional. But if we treat the whole unit as a quote (going from quad-quotes to double-quotes) then this change should have no impact on matching. In practice, I think it's unlikely that folks copied the exception with the quad-quotes. And the upstream linked from our XML is pinned to a specific revision and only uses double-quotes. [1]: https://spdx.org/spdx-license-list/matching-guidelines
01e4746
to
a20c078
Compare
And deprecate the old ID, which had two problems [1]: * Qt no longer belongs to Nokia, and they've changed the title upstream to reflect that. * Qt has another exception for GPL3, and the old ID didn't mention the target license. This commit adds the new identifier and deprecates the old one in its favor. I've used Qt-LGPL-exception-1.1 instead of Kai's suggested Qt-exception-LGPL-1.1 to match the {name}-{target}-exception-{version} pattern suggested by the existing i2p-gpl-java-exception. I've set an <alt> in the exception title so it will match the old "Nokia " prefix, the current "The Qt Company " prefix, or no prefix at all. I've used "The Qt Company " prefix in the test file to match the upstream text (the test is a copy/paste of the upstream text with trailing whitespace removed). [1]: https://lists.spdx.org/pipermail/spdx-legal/2018-March/002511.html Subject: New License/Exception Request: Qt-LGPL-exception-1.1 Date: Fri, 23 Mar 2018 15:39:54 +0000 Message-ID: <HE1PR0202MB2571F0B9F0305FA2150A428F98A80@HE1PR0202MB2571.eurprd02.prod.outlook.com>
Rebased onto master to pick up #392 so we can add an |
spdx-tools can't handle this at the moment: On Fri, Mar 23, 2018 at 11:36:30PM +0000, Gary O'Neall wrote [1]: > The second problem is including the trailing space in the matching > text. Since the matching software tokenizes and removes white > space, it doesn't include the space in the match. To work around > this, just don't include any preceding or trailing white space in > the match. And while the extra leading whitespace in the no-prefix case isn't very elegant, it's not a problem either. [1]: spdx#627 (comment)
I did some more research on the regex matching and confirmed that the order of Regex alternation does matter and is consistent between Regex engine implementations. Based on this information, the pattern should be changed from The problem with the current match is that the empty alternation matches first (and always matches) and the rest of the regex pattern isn't even examined. Since the company name (in this case) is not consumed by the regex matching, the rest of the pattern match will fail. There is no need to surround the We should check the order of the other alts where empty alternations are in the match string - the empty alternation should always be last. |
The alternation operator (|) is commutative in the POSIX spec, which has [1]: > If the pattern permits a variable number of matching characters and > thus there is more than one such sequence starting at that point, > the longest such sequence is matched. Testing with a few engines, here are some that match POSIX: $ echo abc | grep -Eo '|a|ab' ab $ python -c 'import regex; print(regex.match("|a|ab", "abc", regex.POSIX).group(0))' ab # Background in https://bitbucket.org/mrabarnett/mrab-regex/issues/150 $ psql -Atc "select substring('abc' from '|a|ab')" ab The docs for PostgreSQL are somewhat complicated, but for two or more branches connected by the | operator they are always greedy [2]. Here are some engines that prefer the left-most branch: $ python -c 'import re; print(re.match("|a|ab", "abc").group(0))' $ python -c 'import regex; print(regex.match("|a|ab", "abc").group(0))' $ node -e 'console.log("abc".match(/|a|ab/)[0])' # [3] $ ruby -e 'print "abc".match(/|a|ab/); print "\n"' # [4] Go's stdlib provides both left-most-branch [5] and longest-match [6] implementations. So the old order was compliant with POSIX EREs (as referenced in schema/ListedLicense.xsd), but this commit sorts the branches for longest-match-first for compatibility with engines that break POSIX and prefer the left-most branch. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_01_02 [2]: https://www.postgresql.org/docs/10/static/functions-matching.html#POSIX-MATCHING-RULES [3]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-or [4]: https://ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Alternation Although these docs don't have much to say on longest-match vs. left-most branch. [5]: https://golang.org/pkg/regexp/#Compile [6]: https://golang.org/pkg/regexp/#CompilePOSIX
Interesting. I did some digging too, and it looks like the POSIX |
I've filed #629 adding left-most-branch compatibility to the regexps which are already in master. |
Discussed on legal call today; agree to adding new ID and deprecating old ID that references Nokia. |
Deprecate libpng in favor of LibPNG-1.0 * add LibPNG-1.0.xml, with notes about deprecation, and line-wrap at about 80 chars * update libpng.xml to deprecated status, with deprecation License List version, and deprecation notes * add test file LibPNG-1.0.txt, leaving old test file Libpng.txt in place per other deprecations (e.g., #627)
And deprecate the old ID, which had two problems:
This commit adds the new identifier and deprecates the old one in its favor.
I've used Qt-LGPL-exception-1.1 instead of Kai's suggested Qt-exception-LGPL-1.1 to match the
{name}-{target}-exception-{version}
pattern suggested by the existing i2p-gpl-java-exception.I've set an
<alt>
in the exception title so it will match the old “Nokia ” prefix, the current “The Qt Company ” prefix, or no prefix at all. I've used “The Qt Company ” prefix in the test file to match the upstream text (the test is a copy/paste of the upstream text with trailing whitespace removed).Builds on #626; review that first. Someone with write access should label this “new license/exception request”.
CC @kkoehne.