Skip to content

Update Python SPDX library to SPDX 2.1, Google Summer of Code 2018

Yash Nisar edited this page Aug 13, 2018 · 1 revision

Project Overview

Software Package Data Exchange (SPDX) is “a set of standards for communicating the components, licenses, and copyrights associated with software”. You can find the latest (as well as the previous) standards at: https://spdx.org/specifications.

One idea is to accompany software with special files that hold certain meta information: authorship, copyrights, licenses, etc. These files currently come in two major formats: Tag/Value and RDF. There are parsing tools available for these formats in multiple programming languages such as Python and Go: https://github.com/spdx/tools-python and https://github.com/spdx/tools-go.

These tools do not support the latest SPDX-2.1 standard (instead, they can handle SPDX-1.2) which makes it logical to add support for the latest standard. The aim of the project is to update the python SPDX library to SPDX 2.1 specification supporting relationships, annotations, snippet information, a better error management system including full support for complex license expressions and if time permits, the internal model will be refactored to be more pythonic and independent of the RDF data model.

Work Summary

Fix build failure for Travis, Appveyor and CircleCI

Python3 returns 'bytes' and not 'str' when reading/writing to/from a binary stream. So, the files need to be opened in text mode for str related operations.

Related issue: https://github.com/spdx/tools-python/issues/36

Related PR: https://github.com/spdx/tools-python/pull/38


Report an error when UNKNOWN_TAG is found

The cause of the error was the method p_unknown_tag in parsers/tagvalue.py. Due to the incorrect context-free grammar specification defined in the method, the line after the unknown_tag was not taken into consideration. The context-free grammar specification was rectified and tests were corresponding tests were added.

Related issue: https://github.com/spdx/tools-python/issues/55

Related PR: https://github.com/spdx/tools-python/pull/62


Return messages instead of modifying list

We now return messages from each validation function instead of modifying lists in place. It is preferable to have side-effect free functions, so in this case, return a new list with the element modified. There are a few rare corner use cases where in place changes can make sense, but 99% of the times, it a bad idea and can trigger any kind of weird, byzantine and hard to debug bugs.

Related issue: https://github.com/spdx/tools-python/issues/50

Related PR: https://github.com/spdx/tools-python/pull/51


Report correct line numbers while showing errors

This occurs due to the method t_text_end in the file lexers\tagvalue.py. The line t.value.count('\n') doesn't detect newlines because the earlier line is stripped of whitespaces and newlines. This issue is resolved by first counting the no. of newline chars and then stripping whitespaces and newlines so information is preserved.

Related issue: https://github.com/spdx/tools-python/issues/77

Related PR: https://github.com/spdx/tools-python/pull/78


Upgrade the 'Document' class to the SPDX 2.1 specification

Addition of support for attributes like 'SPDX Identifier', 'Document Name', 'SPDX Document Namespace', 'External Document References'.

Related issue: https://github.com/spdx/tools-python/issues/66

Related PR: https://github.com/spdx/tools-python/pull/71


Upgrade the 'File' class to the SPDX 2.1 specification

Addition of support for attributes like the 'File SPDX Identifier'.

Related PR: https://github.com/spdx/tools-python/pull/73


Add 'Annotation' class as a part of the SPDX 2.1 specification

Addition of support for attributes like 'Annotator', 'Annotation Date', 'Annotation Type', 'SPDX Identifier Reference', and 'Annotation Comment '.

Related PR: https://github.com/spdx/tools-python/pull/74


Add 'Snippet' class as a part of the SPDX 2.1 specification

Addition of support for attributes like 'Snippet SPDX Identifier', 'Snippet from File SPDX Identifier', 'Snippet Concluded License', 'License Information in Snippet', 'Snippet Comments on License', 'Snippet Copyright Text', 'Snippet Comments', 'Snippet Name', 'Snippet Byte Range', 'Snippet File Range'.

Related PR: https://github.com/spdx/tools-python/pull/76


Upgrade the 'Package' class to the SPDX 2.1 specification

Addition of support for attributes like 'Package SPDX Identifier', 'Files Analyzed', 'Package Comment', 'External Reference', and 'External Reference Comment'.

Related PR: https://github.com/spdx/tools-python/pull/72