-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add appendix describing SPDX Listed License fields #46
Comments
Would it be better to name it as |
Having an inline table of licenses is fine (although I'd prefer it be autogenerated, see the implementation in #10). However, there's lots of important information in the license repo that is not going to end up in this spec (e.g. optional and alternate markup). I'd rather have license-list-XML document itself (spdx/license-list-XML#391) and then have this spec link to a pretty landing page for a specific version of license-list-XML. Cross-linking the list threads. |
@wking I tend to separate out the internal fields used by the legal team in the creating and maintenance of the licenses from the external fields used by tools. The latter are properties used in the JSON and RDF formats in the license-list-data repository. I would suggest the latter be standardized and documented since changes would be disruptive to tool implementers. The former can be more flexible and should be described in the XML schema document. |
@robinagandhi Good point on the possible (miss)interpretation. The definition of the property would be that the license is designated as "Free" by the FSF (see https://www.gnu.org/licenses/license-list.html). @jlovejoy @kestewart Thoughts? |
In reviewing https://www.gnu.org/licenses/license-list.html, the FSF specifies "Free and Compatible with GPL", "Free and Incompatible with GPL" and "non-free". Do we need to capture all of these states? If so, do we want a second boolean ("FsfGplCompatible")? We could always implement a second boolean in the future. @jlovejoy @kestewart - what do you think? |
On Fri, Oct 13, 2017 at 01:28:36PM -0700, goneall wrote:
@wking I tend to separate out the internal fields used by the legal
team in the creating and maintenance of the licenses from the
external fields used by tools.
That's good. On this front, I would prefer if *none* of the FSF/OSI
material was maintained by the legal team. It should instead be
maintained by the FSF and OSI teams. I'm ok with the tech team
caching license ID matches in license-list-XML or downstream in
license-list-data for now, although with automated matching
(spdx/license-list-XML#418) I'd rather move
even that out of license-list-XML.
The latter are properties used in the JSON and RDF formats in the
license-list-data repository. I would suggest the latter be
standardized and documented since changes would be disruptive to
tool implementers. The former can be more flexible and should be
described in the XML schema document.
I'm not sure the XML format should be purely internal. If that's the
format we find most convenient to maintain our license information,
wouldn't it also be a convenient format for external authors
(e.g. defining their own extension licenses?). I think the
distinctions are that:
1. license-list-XML is young and still non-canonical, so folks should
expect us to be breaking things ;). license-list-data is more
established, so folks should expect us to be supporting
backwards-compat.
2. license-list-data is (or should be) completely automatically
generated. It's just a convenience for folks who perfer other
formats besides those used in license-list-XML and other upstream
sources.
2 is the reason why I'm fine dumping lots of things into
license-list-data. And it's why I'd rather not have external
information compiled into license-list-XML (human-generated data would
be more likely to be buggy, and machine-generated data would distract
reviewers from the human-generated data they should be watching ;).
As far as documentation goes, I'm in favor of documenting all of our
formats. I'm just not sure a spec appendix is the best place to do
it. The draft XSD in license-list-XML is a start at documenting that,
although having something more human-oriented in license-list-XML
would be nice too. But those are for devs. For list consumers, I'd
like to see [1] versioned (and maybe replaced by something
automatically built from license-list-XML). Then the spec can drop
its inline list appendix [2] and just link to an external license list
version. And the license-list can supply as much metadata on the
compiled view (or views) as it wants, but we wouldn't be adding new
field documentation in this repo.
[1]: https://spdx.org/licenses/
[2]: https://spdx.org/spdx-specification-21-web-version#h.1jlao46
|
Agree in principle, but from a practical point of view we may need to maintain the FSF information in the XML format until the FSF implements some sort of supported API. I just added an issue to the tools to automatically generate the OSI information from their API's.
I think we can eventually make the XML external, but it would require more review and will likely slow down the implementation of the internally used XML format. We decided a few months back to keep it internal for the first release of the license XML and revisit the external discussion later. Suggest keeping this a separate decision from documenting the fields in the external license list data.
I don't feel strongly that it needs to be an appendix. I do feel it needs to be documented in a format and location that is considered the source of record. It also needs to be easy to find. |
On Fri, Oct 13, 2017 at 09:36:02PM +0000, goneall wrote:
Agree in principle, but from a practical point of view we may need
to maintain the FSF information in the XML format until the FSF
implements some sort of supported API.
I think this is a great time for them to implement an API ;). If they
don't want to host one, I suggest we implement one on their behalf (in
a new repo, which we can gift to them later). Then we consume our
proxy API until they get their own.
I think we can eventually make the XML external, but it would
require more review and will likely slow down the implementation of
the internally used XML format. We decided a few months back to
keep it internal for the first release of the license XML and
revisit the external discussion later. Suggest keeping this a
separate decision from documenting the fields in the external
license list data.
Sounds good.
> As far as documentation goes, I'm in favor of documenting all of
> our formats. I'm just not sure a spec appendix is the best place
> to do it.
I don't feel strongly that it needs to be an appendix. I do feel it
needs to be documented in a format and location that is considered
the source of record. It also needs to be easy to find.
That all sounds good to me. If we expect to see this markup in SPDX
documents, then it should go in a non-appendix portion of this spec.
With the SPDX hopefully adding IDs for all FSF- and OSI-discussed
licenses, I doubt folks defining LicenseRef-…s will need access to
this field. That means we don't need to talk about it in this repo.
If license-list-XML is for recording the decisions of the legal team,
we won't need to talk about it there either.
That leaves license-list-data (or its generating tooling, which sounds
like part of spdx/tools) as the first place the fields will come in.
I suggest we document these fields there. spdx/tools has a lot going
on, so some docs in license-list-data seem appropriate to me. To make
things discoverable, I like what the OSI has done:
$ curl -sI https://api.opensource.org | grep -i '^http\|^location'
HTTP/1.1 301 Moved Permanently
Location: https://github.com/OpenSourceOrg/api/blob/master/doc/endpoints.md
But however we handle it, having a page on spdx.org that's clearly
labeled “API” (or “Documentation” or “Developers” or whatever) with
docs on (or links to docs on) the API would be good. There doesn't
seem to be much in that regard at the moment [1].
[1]: https://spdx.org/search/node/api
|
With respect to "isFsfFree" one option floated on the last legal call that might help this is to call it "isFsfLibre." That helps avoid the possibility of misinterpretation, at least in English. (This danger is also somewhat, although not fully, mitigated by taking note of the context of the property -- applying to a license rather than a package/file.) IIRC, on that same call (and someone please correct me if I'm wrong) FSF's John Sullivan said that (1) they are not too concerned about the name of the internal identifier SPDX chooses to use, but that (2) FSF views "FSF Approved" and "Is FSF Free/Libre" as different categories (which I took to mean, essentially, that FSF maintains at least two different "approval" lists) -- and that their concern was in SDPX conflating the two (or leading SPDX users to conflate them). I believe what SPDX intends to capture with this property is "Is FSF Free/Libre,"so if that's correct we would probably want to call it something with "Libre" in the name. |
@bradleeedmondson and @goneall: In our interviews with many companies, we found that in internal practices, SPDX may not be adopted entirely but the fields do influence information recorded in internal databases. That said, it is important to not assume that context will always follow the field. So I appreciate the follow up on this topic. I think that |
+1 on FsfLibre proposal |
Will there be advice given for other parties to indicate that they "approve" a license (without adding overhead on your side)? I'm thinking primarily of Debian Free Software Guidelines. Other (smaller) examples include Free Cultural Works or Copyfree. |
On Wed, Oct 25, 2017 at 09:51:08PM +0000, d❤vid wrote:
Will there be advice given for other parties to indicate that they
"approve" a license (without adding overhead on your side)?
If third parties have an API that exports their metadata, we can have
code like spdx/tools#112 to slurp it into the license list. I'd still
like to see that metadata documented in license-list-data or
spdx/tools [1], and not in an in-spec license (if we even keep an
in-spec list [2], I'd be fine keeping the whole list in
license-list-data or other repository).
If third-party APIs expose the license text, we can generate the SPDX
ID ↔ third-party ID map automatically (spdx/license-list-XML#418). In
that case, the interface for third-party inclusion would be:
a. Walk third-party license list, and for each ID:
b. Extract the third-party text. Match it to the license-list-XML
templates to determine the associated SPDX ID/expression (if
any).
If a matching SDPX ID/expression is found:
c. Insert the third-party ID in our metadata, like I do in [3]
and the OSI does in [4].
d. Optionally extract additional third-party metadata from their
API, and stuff it into our data under whatever attribute name we
choose (as with isFsfLibre).
The only things we'd have to maintain in that case would be:
* The API bindings (shouldn't be too bad with stable third-party APIs),
* References for the ID schemes (e.g. “‘osi’ identifiers are [OSI
IDs][5]”), and
* Docs for any metadata we carry.
If we minimize the metadata we carry (some discussion on fetching it
live in spdx/tools#111), then we just have to document the ID schemes.
If this becomes wildly popular, we'd want to define a standard
interface for third-party providers. Then compliant third-parties
could provide their metadata without us needing to code up a
per-provider API client. I think getting to this point is unlikely.
I'm thinking primarily of Debian Free Software Guidelines.
Is there a machine-readable version of [6] or [7]?
[1]: #46 (comment)
[2]: #46 (comment)
[3]: https://github.com/wking/fsf-api/blob/344e24637e31de9ad6e16174d55c370202c67fa7/Expat.json#L2-L4
[4]: https://github.com/OpenSourceOrg/licenses/blob/f7ff223f9694ca0d5114fc82e43c74b5c5087891/licenses/spdx/manual.json#L20-L28
[5]: https://github.com/OpenSourceOrg/api/blob/c903651ef26c35202d6561b61b97d29ead1e08c5/doc/endpoints.md#licenseid
[6]: https://wiki.debian.org/DFSGLicenses
[7]: https://www.debian.org/legal/licenses/
|
I'll take that as a definitive answer! I've pinged the debian-legal mailing list for a machine-readable version (if it exists). However, reading those last two links I was reminded:
...which suggests that "DFSG approved" may not make sense / be helpful as a license flag. I'll update with any definitive response from the mailing list. |
So licenses can be tagged "DFSG (in)compatible" with a note explaining that using DFSG licenses is necessary, but not sufficient, to get a DFSG-compatible package. I still think an API for license rulings makes sense and would be helpful. |
Completely agree on having an API. I would like to propose an additional approach. If a 3rd party provided a machine readable list or map which allows us to map the SPDX license ID to their license list. For example, a simple JSON file with an array of supported SPDX license ID's. Reason for this approach: Some organizations to not maintain the license texts for all licenses. Some are just URL references to other license sites. For example, we ran into this for several Free Software Foundation referenced licenses. In this situation, we should allow the 3rd party to express what SPDX license ID's match those references. If a 3rd party only provides licenses texts, we could use the algorithm @wking suggested above. If a 3rd party only provides a map of SPDX license ID's, we could use that. If a 3rd party provides both, we could use the list/map and run a verification against the license text. |
Note that we could add this to the spec, but any updates to the information provided on the SPDX license list HTML pages (https://spdx.org/licenses) would be decided by the legal team. If an organization would like to be added to the index page or the individual license pages, they should post the request to the legal team mailing list [email protected] |
On Thu, Oct 26, 2017 at 05:58:18PM +0000, goneall wrote:
Reason for this approach: Some organizations to not maintain the
license texts for all licenses. Some are just URL references to
other license sites.
I don't think we want to encourage this use case ;). If folks don't
want to maintain license text, I'm fine with them asserting “the text
we considered for this license matched SPDX ID ${WHATEVER}”. In that
case, they're relying on the SDPX to continue to have that ID match
the text they considered (which is a pretty safe assumption, but is
still delegating significant responsibility to the SPDX). But
delegating to external parties makes everything too wiggly for me.
For example, the FSF currently has [1]:
Sun Industry Standards Source License 1.0
linking to [2], which has:
Sun Industry Standards Source License - Version 1.1
What text did the FSF consider? In this case, the Internet Archive
(assuming you trust *them*) doesn't even have a copy of the 1.0 text
[3]. So, if the third party wants to delegate to us, I think they
need to supply, for each license they are annotating, at least one of:
a. The license text they considered,
b. The SPDX ID they're delegating to, or
c. A link to the license text they considered with the understanding
that we can, at our discretion, use any text ever retrieved from
that location when determining the matching SPDX ID.
[1]: https://www.gnu.org/licenses/license-list.html#SISSL
[2]: http://www.openoffice.org/licenses/sissl_license.html
[3]: https://web.archive.org/web/20010417095704/http://www.openoffice.org:80/licenses/sissl_license.html
|
Good point and I agree. I think the ideal situation is they provide both so that we could verify.
I would prefer not to take on the responsibility of identifying the mapping. Suggest we only allow:
|
On Thu, Oct 26, 2017 at 06:39:58PM +0000, goneall wrote:
I would prefer not to take on the responsibility of identifying the
mapping. Suggest we only allow:
> a. The license text they considered,
> b. The SPDX ID they're delegating to
We already have to identify the mapping if they only provide (a) (and
we should be able to do that automatically,
spdx/license-list-XML#418). So I don't think (c) is a big additional
support burden for us.
If I was a license-metadata provider, I would be doing (a) as the
canonical reference, and might be doing (b) and (c) as non-canonical
hints. But I'm ok with other metadata providers making (b) or (c)
their canonical reference, as long as they understand and are willing
to assume the risks.
|
@wking I've separated out the Debian flag as #48 , including comments from you and debian-legal |
This looks like something we can put into 2.2, (after specifics are figured out). Needs tech team discussion. |
The Code for the left border at https://www.gnu.org/licenses/license-list.html defines five properties:
It would be useful for the FSF if you added them. PS. I'm working as an intern with the FSF tech team at this writing and writing a script to retrieve licenses data from https://raw.githubusercontent.com/spdx/license-list-data/master/json/licenses.json to build license pages for the FSF project https://directory.fsf.org/wiki/ |
You can get these now from https://github.com/wking/fsf-api. But having the FSF pulling data about its own opinions from someone else seems... inefficient ;), Any chance of me handing that over so the FSF can provide its own API? There were some noises in that direction in wking/fsf-api#18. |
@davidhedlund Thanks for the feedback on the fields. I think I'll queue this up for discussion on the legal and tech teams for inclusion in the spec if there are no objections. |
I agree with @wking that this would be a valuable approach. Ideally, the FSF would own the source data and API used to generate their human readable web pages and we could use the same source data to populate the SPDX data. This would be much better than our current approach of parsing / scraping the FSF HTML to generate the data. |
@davidhedlund as to the question about the FSF categories of: "Free and Compatible with GPL" and "Free and Incompatible with GPL" - we just want to capture the ones that are considered free. We don't need/want to get into the in/compatible level of detail. |
From discussion on call, looking for volunteer to generate pull request. Usual suspects oversubscribed, so moving this to 3.0 for now, and revisiting. |
Add |
Thank you. |
This issue is also referenced in https://git.savannah.gnu.org/cgit/directory.git/tree/subprojects/spdx/ISSUES @goneall |
@goneall - is the name of this issue right? |
@jlovejoy I have a different opinion on documenting the fields in the spec. I do want to clarify one thing, however. I'm only suggesting we document the fields that are published on the website and in the license-list-data repo. I'm NOT suggesting we document fields used internally by the legal team in the spec. Those should only be documented in the license-list-XML repo. Since these fields are being used programmatically in applications, they should be documented in the spec so everyone knows where to go for the field information. They also need to be reviewed and released with version control that comes with the spec. Be happy to discuss on a legal team call or tech team call if you like. |
@goneall As part of the Licensing profile, I am working on documenting the license model generally in line with the draft document that you and I (together with some of the GSoC students) had been working on here in 2020. I haven't reviewed the linked issues above, but I'll aim to take a look at those together with the draft writeup that I'm working on. |
These have now been added to the spec in 3.0 - thanks to @swinslow |
Thank you. |
There are several properties used in the SPDX Listed Licenses which are not documented in the specification.
They are currently documented in the RDFa terms used section of the Accessing SPDX Licenses document. There are also references to these fields in the License XML Elements and Attributes document.
Missing elements include:
Propose we add another appendix Listed License Information which details out all the fields including those in common with extracted license text (e.g. licenseId, etc.).
The text was updated successfully, but these errors were encountered: