chapters/appendix-I: Bump license list to v2.6 (via new script) #10

wking · 2017-07-18T20:33:54Z

Add pull-license-list.py to automatically pull the current license and exception JSON into Markdown tables. Using JavaScript might have been more idiomatic than the Python script, but it's faster for me to write the Python; someone who is more familiar with JavaScript can translate the script and incorperate it into the build tooling later if we decide against carrying the auto-generated tables in this repository.

Two things made the auto-generation more difficult than it needed to be:

Newlines in exception names (spdx/license-list-data#10).
[edit: addressed below] No charset declaration in JSON served from spdx.org:
```
  $ curl -sI https://spdx.org/licenses/licenses.json | grep -i content-type
  Content-Type: application/json
```
it should have been Content-Type: application/json; charset=utf-8. I don't know if there is a repository with the Nginx config used for spdx.org, so I haven't filed an issue for this last one.

Fixes #6.

goneall · 2017-07-18T21:34:14Z

I don't know of any nginx configs available, but I'll went ahead and filed a ticket with the Linux foundation helpdesk - request ID #43220.

The newline removal in exception names works around [1]. The UTF-8 fallback works around the source not being served with a charset: $ curl -I https://spdx.org/licenses/licenses.json HTTP/1.1 200 OK Server: nginx Date: Tue, 18 Jul 2017 20:12:58 GMT Content-Type: application/json Content-Length: 108139 Connection: keep-alive Vary: Accept-Encoding Last-Modified: Fri, 06 Jan 2017 21:34:52 GMT Expires: Tue, 18 Jul 2017 22:12:58 GMT Cache-Control: max-age=7200 Strict-Transport-Security: max-age=16070400 Accept-Ranges: bytes $ curl -I https://spdx.org/licenses/exceptions.json HTTP/1.1 200 OK Server: nginx Date: Tue, 18 Jul 2017 20:13:02 GMT Content-Type: application/json Content-Length: 7985 Connection: keep-alive Vary: Accept-Encoding Last-Modified: Fri, 06 Jan 2017 21:34:52 GMT Expires: Tue, 18 Jul 2017 22:13:02 GMT Cache-Control: max-age=7200 Strict-Transport-Security: max-age=16070400 Accept-Ranges: bytes [1]: https://github.com/spdx/license-list-data/issues/10 Subject: Newline in 389-exception name

Generated with: $ ./bin/pull-license-list.py Then I removed the trailing link labels now that the tables are using inline links. The inline links make the table a bit wider, but they are easier to automatically manage because each table is self-contained (vs. accumulating a set of link lables between all tables). The date formatting for "6 Jan 2016" is unfortunate. I'd prefer "2016-01-06" or "January 6th, 2016", but the script is just pulling the raw releaseDate from the JSON, so we get whatever the upstream data is using.

emsearcy · 2017-11-09T18:10:47Z

Unless I'm misunderstanding something, this SO article makes the case that adding a charset to application/json is pointless because all RFC-compliant clients must ignore the charset on this particular type.

Is it only /licenses/licenses.json that you are requesting this be added for? Doing a quick check on the server shows many ASCII .json files under licenses, and I'm not aware of an nginx or httpd option to dynamically pick the correct charset.

# find licenses -name '*.json' | xargs file | grep UTF-8 | wc -l
297
# find licenses -name '*.json' | xargs file | grep ASCII | wc -l
1460

goneall · 2017-11-09T19:29:38Z

Is it only /licenses/licenses.json that you are requesting this be added for? Doing a quick check on the server shows many ASCII .json files under licenses, and I'm not aware of an nginx or httpd option to dynamically pick the correct charset.

All .json files in the /licenses directory.

All of the JSON files should be encoded using UTF-8. I believe the ASCII text found was actually the word "ASCII" in the data portion of the files.

@wking Based on the article SO article referenced above, perhaps we do not need the charset declaration and can just assume it is UTF-8.

BTW - if you find any encoding issues, it is likely due to the source text in the license-list or license-list-XML repositories being incorrect. Everything should be UTF-8, but I have found several cases where the source license text was not in fact UTF-8.

Which adds support for parsing JSON directly from bytes without first decoding to a string [1,2]. This allows interacting with RFC-compliant servers, because RFC 7159 does not define a charset parameter in its application/json registration [3]. Reported by Eric Searcy [4]. [1]: https://docs.python.org/3/whatsnew/3.6.html#json [2]: https://bugs.python.org/issue17909 [3]: https://tools.ietf.org/html/rfc7159#page-11 [4]: spdx#10 (comment)

wking · 2017-11-09T20:06:49Z

On Thu, Nov 09, 2017 at 06:10:47PM +0000, Eric Searcy wrote: Unless I'm misunderstanding something, [this SO article](https://stackoverflow.com/a/26206930) makes the case that adding a charset to application/json is pointless because all RFC-compliant clients must ignore the charset on this particular type.

Fair. It looks like Python has added direct-byte-handling for this in 3.6 [1,2], so I'll change my mind and say it's appropriate to continue to not set charset for application/json. [1]: https://docs.python.org/3/whatsnew/3.6.html#json [2]: https://bugs.python.org/issue17909

wking · 2017-11-09T20:09:10Z

Updated to use Python 3.6+ with 1e38bd9 → 79e2162.

iamwillbar · 2019-10-24T19:36:50Z

This PR is superseded by #151.

Which adds support for parsing JSON directly from bytes without first decoding to a string [1,2]. This allows interacting with RFC-compliant servers, because RFC 7159 does not define a charset parameter in its application/json registration [3]. Reported by Eric Searcy [4]. [1]: https://docs.python.org/3/whatsnew/3.6.html#json [2]: https://bugs.python.org/issue17909 [3]: https://tools.ietf.org/html/rfc7159#page-11 [4]: #10 (comment)

Based off #10 this resolves a minor bug in the license list update script and updates the license list to v3.7.

Which adds support for parsing JSON directly from bytes without first decoding to a string [1,2]. This allows interacting with RFC-compliant servers, because RFC 7159 does not define a charset parameter in its application/json registration [3]. Reported by Eric Searcy [4]. [1]: https://docs.python.org/3/whatsnew/3.6.html#json [2]: https://bugs.python.org/issue17909 [3]: https://tools.ietf.org/html/rfc7159#page-11 [4]: spdx#10 (comment)

Based off spdx#10 this resolves a minor bug in the license list update script and updates the license list to v3.7.

wking mentioned this pull request Jul 18, 2017

Update SPDX License List in Appendix 1 #6

Closed

wking added 2 commits July 18, 2017 15:19

wking force-pushed the automatic-license-list-update branch from aea3ce1 to 1e38bd9 Compare July 18, 2017 22:20

wking mentioned this pull request Aug 3, 2017

format-xml: Add a script for pretty-printing our XML source spdx/license-list-XML#432

Closed

sschuberth requested a review from tsteenbe September 18, 2017 09:19

wking mentioned this pull request Oct 13, 2017

Add appendix describing SPDX Listed License fields #46

Closed

zvr added this to the 2.2 milestone Dec 19, 2017

wking referenced this pull request Apr 11, 2018

Fix location of license list master files

650afda

iamwillbar mentioned this pull request Oct 24, 2019

Automatic license list update 3.7 #151

Merged

kestewart closed this Nov 5, 2019

kestewart pushed a commit that referenced this pull request Nov 26, 2019

Update license list to v3.7

d757bf6

Based off #10 this resolves a minor bug in the license list update script and updates the license list to v3.7.

SantiagoTorres pushed a commit to SantiagoTorres/spdx-spec that referenced this pull request Jan 10, 2020

Update license list to v3.7

817f0c7

Based off spdx#10 this resolves a minor bug in the license list update script and updates the license list to v3.7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapters/appendix-I: Bump license list to v2.6 (via new script) #10

chapters/appendix-I: Bump license list to v2.6 (via new script) #10

wking commented Jul 18, 2017 •

edited

Loading

goneall commented Jul 18, 2017

emsearcy commented Nov 9, 2017

goneall commented Nov 9, 2017

wking commented Nov 9, 2017 via email

wking commented Nov 9, 2017

iamwillbar commented Oct 24, 2019

chapters/appendix-I: Bump license list to v2.6 (via new script) #10

chapters/appendix-I: Bump license list to v2.6 (via new script) #10

Conversation

wking commented Jul 18, 2017 • edited Loading

goneall commented Jul 18, 2017

emsearcy commented Nov 9, 2017

goneall commented Nov 9, 2017

wking commented Nov 9, 2017 via email

wking commented Nov 9, 2017

iamwillbar commented Oct 24, 2019

wking commented Jul 18, 2017 •

edited

Loading