Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify storage readers #8422

Merged
merged 10 commits into from
Mar 18, 2024
Merged

Simplify storage readers #8422

merged 10 commits into from
Mar 18, 2024

Conversation

mnonnenmacher
Copy link
Member

@mnonnenmacher mnonnenmacher commented Mar 14, 2024

Simplify the storage reader interfaces by merging the read functions, and apply several refactorings as clean up after switching to the new scanner implementation a while ago. This is a preparation for turning scan storages into plugins. See the commit messages for details.

@mnonnenmacher mnonnenmacher force-pushed the simplify-storage-readers branch 3 times, most recently from 4fb53b4 to 31a6b7a Compare March 17, 2024 12:29
@mnonnenmacher mnonnenmacher marked this pull request as ready for review March 17, 2024 13:15
@mnonnenmacher mnonnenmacher requested review from oheger-bosch and a team as code owners March 17, 2024 13:15
* of a package using the same version name are used (e.g. multiple scans of a "1.0-SNAPSHOT" version during
* development).
*/
fun read(packages: Collection<Package>, scannerMatcher: ScannerMatcher): Result<Map<Identifier, List<ScanResult>>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comment on why this function is actually unused? I mean, as we're pretty much always interested in results of multiple packages, it would seem more efficient to only use this function in order to not apply the scanner matcher for every package we request.

Copy link
Member Author

@mnonnenmacher mnonnenmacher Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did not use it in the new scanner implementation for simplicity. Also, the new scanner converts package scan results into provenance scan results and reuses them if another package has the same provenance, this approach conflicts with downloading results in bulk. And since the switch to the new scanner two years ago the downloading of stored scan results never seemed to be a performance problem so no one looked into optimizing the performance, e.g. by taking this function into use again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details. I'll leave it up to you whether you think it's worth putting some of that into the commit message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some of it to the commit message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI @mnonnenmacher, we might have a use-case to add back the API for reading multiple packages at once: Is there is a general problem with some scanner backend, there is no point in continuing reading from it if the first read already fails, and we'd like to be able to fail all of the reads at once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking it back into use might require a bigger refactoring of the Scanner, losing some of the performance optimizations along the way, e.g. not downloading scan results multiple times for packages that share the same provenance.
If the goal is to disable a storage if a certain error occurred, that could be implemented also without introducing such a function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could be implemented also without introducing such a function.

What do you have in mind @mnonnenmacher? Like a special Result value / exception being throw that signals the caller that no further packages should be tried?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like that. I think the tricky part would be to decide which kind of error should trigger this behavior. In most cases it might be better to fix the storage than to rescan everything, but that depends on the specific setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tricky part would be to decide which kind of error should trigger this behavior.

I believe this should simply be signaled via a new special exception that implementations may throw...

plugins/scanners/scancode/src/main/kotlin/ScanCode.kt Outdated Show resolved Hide resolved
Remove obsolete docs from the companion object of `ScanResultsStorage`.
This is a fixup for 1ad5357.

Signed-off-by: Martin Nonnenmacher <[email protected]>
Merge the read functions of the `PackageBasedScanStorageReader` into one
by making the `scannerMatcher` argument nullable. This makes the `read`
function without `scannerMatcher` argument redundant.

The same change is applied to the `ScanResultsStorage` class where the
`read` function implements the filtering, so that classes implementing
the `readInternal` function only have to implement the filtering if they
can do so more efficiently.

Signed-off-by: Martin Nonnenmacher <[email protected]>
@mnonnenmacher mnonnenmacher force-pushed the simplify-storage-readers branch from 31a6b7a to af4a747 Compare March 17, 2024 20:39
Remove the unused `read` function that takes multiple packages as
argument. The function is not used anymore since c71628d.

The current scanner implementation does other optimizations when using
package based storages to avoid downloading multiple results for the
same provenance, so it is unlikely that the function in the current form
would be taken into use again.

Signed-off-by: Martin Nonnenmacher <[email protected]>
…reader

Merge the read functions of the `ProvenanceBasedScanStorageReader` into
one by making the `scannerMatcher` argument nullable. This makes the
`read` function without `scannerMatcher` argument redundant.

Signed-off-by: Martin Nonnenmacher <[email protected]>
The `ScanResultsStorage.EMPTY_RESULT` constant was only used in a single
place.

Signed-off-by: Martin Nonnenmacher <[email protected]>
Move the abstract base class for package based scan storages to the
storage package, next to the classes that inherit from it.

Signed-off-by: Martin Nonnenmacher <[email protected]>
Rename `ScanResultsStorage` to `AbstractPackageBasedScanStorage`. The
old name was introduced before the new scan storage interface, so
replace it with a name that better the purpose of the class now.

While at it, also fix some comment to reference the base interfaces
instead.

Signed-off-by: Martin Nonnenmacher <[email protected]>
Rename the abstract storage test classes to better match the names of
the related storage interfaces.

Signed-off-by: Martin Nonnenmacher <[email protected]>
Rename the package based file and Postgres storage to match the names of
their provenance based counterparts.

Signed-off-by: Martin Nonnenmacher <[email protected]>
@mnonnenmacher mnonnenmacher force-pushed the simplify-storage-readers branch from af4a747 to 293a806 Compare March 18, 2024 10:36
@mnonnenmacher mnonnenmacher enabled auto-merge (rebase) March 18, 2024 10:36
@mnonnenmacher mnonnenmacher merged commit 20df885 into main Mar 18, 2024
21 checks passed
@mnonnenmacher mnonnenmacher deleted the simplify-storage-readers branch March 18, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants