Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scanner): Revert the merge of read functions in scan storages #8611

Closed
wants to merge 1 commit into from

Conversation

fviernau
Copy link
Member

@fviernau fviernau commented May 3, 2024

With this change it is no more possible to obtain scan results by identifiers, without provenance. In particular, this breaks helper-cli's Create command which always passes an empty provenance, see 1.

This reverts commit 70b1b86.

With this change it is no more possible to obtain scan results by
identifiers, without provenance. In particular, this breaks
`helper-cli`'s `Create` command which always passes an empty
provenance, see [1].

This reverts commit 70b1b86.

[1]: https://github.com/oss-review-toolkit/ort/blob/9b34a7b1f21237d01a29e3eb64fa2c767bee120c/helper-cli/src/main/kotlin/commands/packageconfig/CreateCommand.kt#L123

Signed-off-by: Frank Viernau <[email protected]>
@fviernau fviernau requested review from oheger-bosch and a team as code owners May 3, 2024 09:13
Copy link

codecov bot commented May 3, 2024

Codecov Report

Attention: Patch coverage is 63.04348% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 67.93%. Comparing base (9b34a7b) to head (8ba274a).

Files Patch % Lines
...kotlin/storages/AbstractPackageBasedScanStorage.kt 66.66% 5 Missing and 3 partials ⚠️
...ain/kotlin/storages/PackageBasedPostgresStorage.kt 63.15% 6 Missing and 1 partial ⚠️
...rc/main/kotlin/storages/PackageBasedFileStorage.kt 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8611      +/-   ##
============================================
- Coverage     67.97%   67.93%   -0.04%     
+ Complexity     1005     1003       -2     
============================================
  Files           244      244              
  Lines          7844     7869      +25     
  Branches        876      876              
============================================
+ Hits           5332     5346      +14     
- Misses         2129     2139      +10     
- Partials        383      384       +1     
Flag Coverage Δ
funTest-docker 66.49% <ø> (ø)
funTest-non-docker 34.73% <60.86%> (+0.10%) ⬆️
test 37.79% <13.04%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

*
* Throws a [ScanStorageException] if an error occurs while reading from the storage.
*/
fun read(pkg: Package, nestedProvenance: NestedProvenance): List<NestedProvenanceScanResult>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change it is no more possible to obtain scan results by identifiers, without provenance.

I'm having difficulties to understand this reason, because all this commit does is to add back a function to the interface that also takes a NestedProvenance. So how does getting this function back help to read scan results without provenance?

Or phrased differently, this code in the helper-cli

scanResultsStorage.read(Package.EMPTY.copy(id = packageId)).getOrThrow()

stays the same before and after this change. What magic does it make work, and why was it broken before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code filters by provenance:

    override fun readInternal(pkg: Package, scannerMatcher: ScannerMatcher?): Result<List<ScanResult>> =
        readForId(pkg.id).map { results -> results.filter { it.provenance.matches(pkg) } }

The code before didn't:

    override fun readInternal(pkg: Package): Result<List<ScanResult>> {
        val path = storagePath(pkg.id)

        return runCatching {
            backend.read(path).use { input ->
                yamlMapper.readValue<ScanResultContainer>(input).results
            }
        }.recoverCatching {
            // If the file cannot be found it means no scan results have been stored, yet.
            if (it is FileNotFoundException) return Result.success(emptyList())

            val message = "Could not read scan results for '${pkg.id.toCoordinates()}' from path '$path': " +
                it.collectMessages()

            logger.info { message }

            throw ScanStorageException(message)
        }
    }

@sschuberth sschuberth requested a review from mnonnenmacher May 3, 2024 09:34
Copy link
Member

@mnonnenmacher mnonnenmacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the proper fix would be to start using the PackageProvenanceStorage to retrieve the resolved provenance instead of reverting this commit (there is readProvenances(id: Identifier): List<PackageProvenanceResolutionResult> to get all resolved provenances for a package). That would also be less code to maintain than reintroducing this function. Also, I don't like that all package based storages would have to implement that function with the only user being a helper-cli command which is hardcoded to use a local package based storage.

@fviernau
Copy link
Member Author

fviernau commented May 3, 2024

To me the proper fix would be to start using the PackageProvenanceStorage to retrieve the resolved provenance instead of reverting this commit (there is readProvenances(id: Identifier): List<PackageProvenanceResolutionResult> to get all resolved provenances for a package).

The use case is as follows:

  1. orthw init <ort-result> creates a new empty local file storage and imports scan results by package into that storage
  2. Then orthw uses further orth commands which operate on that storage, such as the now broken Create command.

So, the storage involved is not the shared (usually postgtres storage), but a local one which just contains the scan results of one single OrtResult. The workflow shall remain usable given any OrtResult, without having access to a database. IIUC the above proposal it would not fullfill that requirement.

Would you have an alternative idea which fullfills this requirement @mnonnenmacher?

In particular, would it be an option to drop the filtering by provenance again, which has been introduced in your commit?

@fviernau
Copy link
Member Author

fviernau commented May 3, 2024

I've crafted a minimal invasive solution to fix the issue timely: #8613
Please take a look.

@mnonnenmacher
Copy link
Member

To me the proper fix would be to start using the PackageProvenanceStorage to retrieve the resolved provenance instead of reverting this commit (there is readProvenances(id: Identifier): List<PackageProvenanceResolutionResult> to get all resolved provenances for a package).

The use case is as follows:

1. `orthw init <ort-result>` creates a new empty local file storage and imports scan results by package into that storage

2. Then `orthw` uses further `orth` commands which operate on that storage, such as the now broken `Create` command.

So, the storage involved is not the shared (usually postgtres storage), but a local one which just contains the scan results of one single OrtResult. The workflow shall remain usable given any OrtResult, without having access to a database. IIUC the above proposal it would not fullfill that requirement.

Would you have an alternative idea which fullfills this requirement @mnonnenmacher?

The provenance resolutions can also be stored in a local file storage, this is the default if no scan storage is configured. So I think the cleanest solution would be that orthw init not only populates the scan results storage but also the provenance storage. This should only require a few lines of code, as the resolved provenance can directly be read from the scan results.

In particular, would it be an option to drop the filtering by provenance again, which has been introduced in your commit?

I would really not like that as for the ORT scanner it is a central concept that scan results have to match not only the id but also the provenance of a package, this is not only the case for scan result storages but for example also for package configurations.

@fviernau
Copy link
Member Author

fviernau commented May 3, 2024

The provenance resolutions can also be stored in a local file storage, this is the default if no scan storage is configured. So I think the cleanest solution would be that orthw init not only populates the scan results storage but also the provenance storage. This should only require a few lines of code, as the resolved provenance can directly be read from the scan results.

If I implemented that, my gut feeling is that it would not be a super low effort hing to do.

I would really not like that as for the ORT scanner it is a central concept that scan results have to match not only the id but also the provenance of a package, this is not only the case for scan result storages but for example also for package configurations.

We had for a long time a central concept which allowed to retrieve scan results by ID from any storage.
I would have objected to making this impossible in [1], but the rationale provided there was different.
Also the commit which introduced the filtering, did sound like a non-functional change in the commit message.
To me it does not look too unnatural if an API which allows storing things by ID allows retrieving things by ID.
Anyhow, I propose to shortcut discussion by merging #8613.

[1] #7641

@fviernau fviernau closed this May 6, 2024
@fviernau fviernau deleted the scanner-revert-read-merge branch May 6, 2024 06:18
@fviernau
Copy link
Member Author

fviernau commented May 6, 2024

Alternative solution has been merged: #7641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants