Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapters/appendix-I: Bump license list to v2.6 (via new script) #10

Closed
wants to merge 3 commits into from

Conversation

wking
Copy link
Contributor

@wking wking commented Jul 18, 2017

Add pull-license-list.py to automatically pull the current license and exception JSON into Markdown tables. Using JavaScript might have been more idiomatic than the Python script, but it's faster for me to write the Python; someone who is more familiar with JavaScript can translate the script and incorperate it into the build tooling later if we decide against carrying the auto-generated tables in this repository.

Two things made the auto-generation more difficult than it needed to be:

  • Newlines in exception names (spdx/license-list-data#10).

  • [edit: addressed below] No charset declaration in JSON served from spdx.org:

      $ curl -sI https://spdx.org/licenses/licenses.json | grep -i content-type
      Content-Type: application/json
    

    it should have been Content-Type: application/json; charset=utf-8. I don't know if there is a repository with the Nginx config used for spdx.org, so I haven't filed an issue for this last one.

Fixes #6.

@goneall
Copy link
Member

goneall commented Jul 18, 2017

I don't know of any nginx configs available, but I'll went ahead and filed a ticket with the Linux foundation helpdesk - request ID #43220.

The newline removal in exception names works around [1].

The UTF-8 fallback works around the source not being served with a
charset:

  $ curl -I https://spdx.org/licenses/licenses.json
  HTTP/1.1 200 OK
  Server: nginx
  Date: Tue, 18 Jul 2017 20:12:58 GMT
  Content-Type: application/json
  Content-Length: 108139
  Connection: keep-alive
  Vary: Accept-Encoding
  Last-Modified: Fri, 06 Jan 2017 21:34:52 GMT
  Expires: Tue, 18 Jul 2017 22:12:58 GMT
  Cache-Control: max-age=7200
  Strict-Transport-Security: max-age=16070400
  Accept-Ranges: bytes

  $ curl -I https://spdx.org/licenses/exceptions.json
  HTTP/1.1 200 OK
  Server: nginx
  Date: Tue, 18 Jul 2017 20:13:02 GMT
  Content-Type: application/json
  Content-Length: 7985
  Connection: keep-alive
  Vary: Accept-Encoding
  Last-Modified: Fri, 06 Jan 2017 21:34:52 GMT
  Expires: Tue, 18 Jul 2017 22:13:02 GMT
  Cache-Control: max-age=7200
  Strict-Transport-Security: max-age=16070400
  Accept-Ranges: bytes

[1]: https://github.com/spdx/license-list-data/issues/10
     Subject: Newline in 389-exception name
Generated with:

  $ ./bin/pull-license-list.py

Then I removed the trailing link labels now that the tables are using
inline links.  The inline links make the table a bit wider, but they
are easier to automatically manage because each table is
self-contained (vs. accumulating a set of link lables between all
tables).

The date formatting for "6 Jan 2016" is unfortunate.  I'd prefer
"2016-01-06" or "January 6th, 2016", but the script is just pulling
the raw releaseDate from the JSON, so we get whatever the upstream
data is using.
@emsearcy
Copy link

emsearcy commented Nov 9, 2017

Unless I'm misunderstanding something, this SO article makes the case that adding a charset to application/json is pointless because all RFC-compliant clients must ignore the charset on this particular type.

Is it only /licenses/licenses.json that you are requesting this be added for? Doing a quick check on the server shows many ASCII .json files under licenses, and I'm not aware of an nginx or httpd option to dynamically pick the correct charset.

# find licenses -name '*.json' | xargs file | grep UTF-8 | wc -l
297
# find licenses -name '*.json' | xargs file | grep ASCII | wc -l
1460

@goneall
Copy link
Member

goneall commented Nov 9, 2017

Is it only /licenses/licenses.json that you are requesting this be added for? Doing a quick check on the server shows many ASCII .json files under licenses, and I'm not aware of an nginx or httpd option to dynamically pick the correct charset.

All .json files in the /licenses directory.

All of the JSON files should be encoded using UTF-8. I believe the ASCII text found was actually the word "ASCII" in the data portion of the files.

@wking Based on the article SO article referenced above, perhaps we do not need the charset declaration and can just assume it is UTF-8.

BTW - if you find any encoding issues, it is likely due to the source text in the license-list or license-list-XML repositories being incorrect. Everything should be UTF-8, but I have found several cases where the source license text was not in fact UTF-8.

Which adds support for parsing JSON directly from bytes without first
decoding to a string [1,2].  This allows interacting with
RFC-compliant servers, because RFC 7159 does not define a charset
parameter in its application/json registration [3].

Reported by Eric Searcy [4].

[1]: https://docs.python.org/3/whatsnew/3.6.html#json
[2]: https://bugs.python.org/issue17909
[3]: https://tools.ietf.org/html/rfc7159#page-11
[4]: spdx#10 (comment)
@wking
Copy link
Contributor Author

wking commented Nov 9, 2017 via email

@wking
Copy link
Contributor Author

wking commented Nov 9, 2017

Updated to use Python 3.6+ with 1e38bd979e2162.

@iamwillbar
Copy link
Collaborator

This PR is superseded by #151.

@kestewart kestewart closed this Nov 5, 2019
kestewart pushed a commit that referenced this pull request Nov 26, 2019
Which adds support for parsing JSON directly from bytes without first
decoding to a string [1,2].  This allows interacting with
RFC-compliant servers, because RFC 7159 does not define a charset
parameter in its application/json registration [3].

Reported by Eric Searcy [4].

[1]: https://docs.python.org/3/whatsnew/3.6.html#json
[2]: https://bugs.python.org/issue17909
[3]: https://tools.ietf.org/html/rfc7159#page-11
[4]: #10 (comment)
kestewart pushed a commit that referenced this pull request Nov 26, 2019
Based off #10 this resolves a minor bug in the license list update
script and updates the license list to v3.7.
SantiagoTorres pushed a commit to SantiagoTorres/spdx-spec that referenced this pull request Jan 10, 2020
Which adds support for parsing JSON directly from bytes without first
decoding to a string [1,2].  This allows interacting with
RFC-compliant servers, because RFC 7159 does not define a charset
parameter in its application/json registration [3].

Reported by Eric Searcy [4].

[1]: https://docs.python.org/3/whatsnew/3.6.html#json
[2]: https://bugs.python.org/issue17909
[3]: https://tools.ietf.org/html/rfc7159#page-11
[4]: spdx#10 (comment)
SantiagoTorres pushed a commit to SantiagoTorres/spdx-spec that referenced this pull request Jan 10, 2020
Based off spdx#10 this resolves a minor bug in the license list update
script and updates the license list to v3.7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants