Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find ScanCode license texts in Docker image #8147

Closed
mnonnenmacher opened this issue Jan 21, 2024 · 5 comments · Fixed by #9622
Closed

Cannot find ScanCode license texts in Docker image #8147

mnonnenmacher opened this issue Jan 21, 2024 · 5 comments · Fixed by #9622
Labels
bug Issues that are considered to be bugs docker About Docker topics spdx-utils About the SPDX utility library

Comments

@mnonnenmacher
Copy link
Member

mnonnenmacher commented Jan 21, 2024

ORT cannot find the ScanCode license texts in the Docker image:

$ docker run --rm ghcr.io/oss-review-toolkit/ort:13.0.0 requirements
[...]
ScanCode license texts not found.

The locations of the ScanCode binary and the license texts within the image are:
/opt/python/shims/scancode
/opt/python/versions/3.11.5/lib/python3.11/site-packages/licensedcode/data/licenses

The heuristic to find the license texts based on the path of the ScanCode binary fails for this:

val scanCodeLicenseTextDir by lazy {
val scanCodeExeDir = Os.getPathFromEnvironment("scancode")?.realFile()?.parentFile
val pythonBinDir = listOf("bin", "Scripts")
val scanCodeBaseDir = scanCodeExeDir?.takeUnless { it.name in pythonBinDir } ?: scanCodeExeDir?.parentFile
scanCodeBaseDir?.walkTopDown()?.find { it.isDirectory && it.endsWith("licensedcode/data/licenses") }
}

@mnonnenmacher mnonnenmacher added bug Issues that are considered to be bugs docker About Docker topics spdx-utils About the SPDX utility library labels Jan 21, 2024
@sschuberth
Copy link
Member

My proposal is to actually not fix this by adjusting the heuristic to find the license texts, at least not as a long-term solution. Instead, we should probably make use of the new scancode-license-data command. And I'd wish that before even doing that, we'd have the time to also extract license text providers to a proper interface / plugins.

wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 16, 2024
If ORT is executed within a Docker container, the report generator for the
OSS disclosure document may not be able to look up the license texts
collected by ScanCode, leading to empty sections in the disclosure document.

In environments where Python version management tools like Pyenv are used,
directory structures differ, leading to different paths for data directories,
causing the reporter to fail looking up the ScanCode license texts.

Update the heuristic algorithm to locate the ScanCode license texts directory
based on the path of the ScanCode binary: Ensure compatibility with directory
layouts managed by Python version management tools.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
@sschuberth
Copy link
Member

extract license text providers to a proper interface / plugins.

I've created #9620 for this now.

wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 16, 2024
If ORT is executed within a Docker container, the report generator for
the OSS disclosure document may not be able to look up the license texts
collected by ScanCode, leading to empty sections in the disclosure
document.

In environments where Python version management tools like Pyenv are
used, directory structures differ, leading to different paths for data
directories, causing the reporter to fail looking up the ScanCode license
texts.

Update the heuristic algorithm to locate the ScanCode license texts
directory based on the path of the ScanCode binary. Ensure compatibility
with directory layouts managed by Python version management tools.

Fixes oss-review-toolkit#8147.
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 16, 2024
If ORT is executed within a Docker container, the report generator for
the OSS disclosure document may not be able to look up the license texts
collected by ScanCode, leading to empty sections in the disclosure
document.

In environments where Python version management tools like Pyenv are
used, directory structures differ, leading to different paths for data
directories, causing the reporter to fail looking up the ScanCode license
texts.

Update the heuristic algorithm to locate the ScanCode license texts
directory based on the path of the ScanCode binary: Ensure compatibility
with directory layouts managed by Python version management tools.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 17, 2024
If ORT is executed within a Docker container, the report generator for
the OSS disclosure document may not be able to look up the license texts
collected by ScanCode, leading to empty sections in the disclosure
document.

In environments where Python version management tools like Pyenv are
used, directory structures differ, leading to different paths for data
directories, causing the reporter to fail looking up the ScanCode license
texts.

Update the heuristic algorithm to locate the ScanCode license texts
directory based on the path of the ScanCode binary: Ensure compatibility
with directory layouts managed by Python version management tools.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 17, 2024
If ORT is executed within a Docker container, the report generator for
the OSS disclosure document may not be able to look up the license texts
collected by ScanCode, leading to empty sections in the disclosure
document.

In environments where Python version management tools like Pyenv are
used, directory structures differ, leading to different paths for data
directories, causing the reporter to fail looking up the ScanCode license
texts.

Update the heuristic algorithm to locate the ScanCode license texts
directory based on the path of the ScanCode binary: Ensure compatibility
with directory layouts managed by Python version management tools.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 18, 2024
Dump the ScanCode license texts to directory /opt/scancode-license-data
when creating the ORT docker container. Use this directory as
fallback option if the ScanCode license texts cannot be located by
the existing heuristic algorithm.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 19, 2024
Dump the ScanCode license texts to directory /opt/scancode-license-data
when creating the ORT docker container. Use this directory as
fallback option if the ScanCode license texts cannot be located by
the existing heuristic algorithm.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 19, 2024
Dump the ScanCode license texts to directory /opt/scancode-license-data
when creating the ORT docker container. Use this directory as
fallback option if the ScanCode license texts cannot be located by
the existing heuristic algorithm.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
wkl3nk added a commit to boschglobal/oss-review-toolkit that referenced this issue Dec 20, 2024
Dump the ScanCode license texts to directory /opt/scancode-license-data
when creating the ORT docker container. Use this directory as
fallback option if the ScanCode license texts cannot be located by
the existing heuristic algorithm.

Fixes oss-review-toolkit#8147.

Signed-off-by: Wolfgang Klenk <[email protected]>
@sschuberth
Copy link
Member

Thinking about this again, can it be that the issue was a red herring? While it's correct that scanCodeLicenseTextDir (which is what ort requirements calls) returned null before #9622, that should not actually have mattered for the use-case of disclosure document generation, as

fun getLicenseTextReader(
id: String,
handleExceptions: Boolean = false,
licenseTextDirectories: List<File> = emptyList()
): (() -> String)? {
return if (id.startsWith(LICENSE_REF_PREFIX)) {
getLicenseTextResource(id)?.let { { it.readText() } }
?: addScanCodeLicenseTextsDir(licenseTextDirectories).firstNotNullOfOrNull { dir ->
getLicenseTextFile(id, dir)?.let { file ->
{
file.readText().removeYamlFrontMatter()
}
}
}
} else {
SpdxLicense.forId(id.removeSuffix("+"))?.let { { it.text } }
?: SpdxLicenseException.forId(id)?.takeIf { handleExceptions }?.let { { it.text } }
}
}

actually first tries to read license texts from ORT's built-in resources before falling back to scanCodeLicenseTextDir.

So @wkl3nk can you confirm that there actually was an issue with any disclosure documents? If so, for which license ID?

@mnonnenmacher
Copy link
Member Author

So @wkl3nk can you confirm that there actually was an issue with any disclosure documents? If so, for which license ID?

To my knowledge the issue was not related to SPDX licenses but only licenses starting with "LicenseRef-scancode-".

@sschuberth
Copy link
Member

To my knowledge the issue was not related to SPDX licenses but only licenses starting with "LicenseRef-scancode-".

I see. There are indeed not part of https://github.com/oss-review-toolkit/ort/tree/main/utils/spdx/src/main/resources/licenserefs. Something to reconsider, maybe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that are considered to be bugs docker About Docker topics spdx-utils About the SPDX utility library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants