The FSF is interested in having the SPDX expose some of its metadata in the SPDX license list. The cleanest way to do that is to have the FSF provide their annotated license list in a format that is more convenient for automated tools. For example, the OSI provides an API which, while currently non-canonical, provides convenient access to OSI license annotations.
This repository scrapes the FSF list and provides the scraped data in a JSON API for others to consume. Ideally we'll hand this repository over to the FSF once they're ready to maintain it, or we'll deprecate this repository if they decide to provide a different API.
You can pull an array of identifiers from https://wking.github.io/fsf-api/licenses.json.
You can pull an object with all the license data https://wking.github.io/fsf-api/licenses-full.json.
You can pull an individual license from a few places:
-
https://wking.github.io/fsf-api/{id}.json
For example https://wking.github.io/fsf-api/Expat.json.
-
Using a non-FSF ID, according to the mapping between other scheme and the FSF scheme asserted by this API:
https://wking.github.io/fsf-api/{scheme}/{id}.json
For example https://wking.github.io/fsf-api/spdx/MIT.json. This API currently attempts to maintain the following mappings:
spdx
, using the SPDX identifiers.
Licenses have the following properties:
-
id
: a short slug identifying the license. Inlicenses-full.json
, this is information is in the in root object key and not duplicated in the value. -
name
: a short string naming the license. -
uris
: an array of URIs for the license. The first entry in this array will always be an entry on the the FSF's HTML page. The order of the remaining entries is not significant. -
tags
: an array of FSF categories for the license. The FSF currently defines the following categories:gpl-2-compatible
andgpl-3-compatible
: licenses that are GPL-compatible.fdl-compatible
: licenses that are FDL-compatible.libre
: licenses that are either GPL-compatible, FDL-compatible, or are otherwise free.viewpoint
: licenses for works stating a viewpoint.non-free
: licenses that are non-free.
-
identifiers
: an object with mappings to other license lists. This API currently attempts to maintain the following mappings:-
spdx
: For licenses with SPDX IDs, thespdx
value will hold an array of SPDX identifiers. Licenses may have multiple SPDX entries when SPDX list defines per-grant IDs that share the same license (e.g.GPL-3.0-only
andGPL-3.0-or-later
). The first entry in the SPDX array is the one that most closely matches the FSF license. For example, the FSF'sGNUGPLv3
text has:However, most software released under GPLv2 allows you to use the terms of later versions of the GPL as well.
and the GPLv3 text suggests an “any later version” grant, so
GPL-3.0-or-later
is the first SPDX identifier,GPL-3.0-only
is the second, and the deprecatedGPL-3.0
is the third.
-
There are currently some hacks in the pulling script:
-
SPLITS
, which:- Unpacks some places where the FSF's HTML page uses a single identifier for multiple licenses (e.g. using
AcademicFreeLicense
for “all versions through 3.0”). - Repacks places where the FSF's HTML page uses two identifiers for the same license (e.g. to classify
FreeBSD
as both GPL-compatible and FDL-compatible).
- Unpacks some places where the FSF's HTML page uses a single identifier for multiple licenses (e.g. using
-
IDENTIFIERS
, which maps FSF identifiers to other schemes. Ideally this would be based on automated license-text comparison, but in order for that to work this API would have to expose the license text that the FSF considered for each ID. Currently, the FSF's HTML page links to license source, but not in a consistent enough way for me to extract the text. -
TAG_OVERRIDES
, which setstags
where the human-readable text on the FSF's annotated list has more detail than the easily-machine-readable content. For example, the FSF currently only distinguishes betweengpl-2-compatible
andgpl-3-compatible
in text, so licenses that are only compatible with one or the other need tag overrides.
Until these hacks are addressed, license IDs and the tags
and identifiers
fields should be taken with a grain of salt.
Contributions are welcome!