Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for the propagation of vulnerabilities within git repositories #2140

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

CharlyReux
Copy link

@CharlyReux CharlyReux commented Apr 29, 2024

In relation to some research we are making about the security in the Open Source ecosystem, we investigated how vulnerabilities are propagated in repositories. While reviewing the analysis output of the propagation of vulnerability, we noticed some unexpected behaviors, as per @RomainLefeuvre 's issues.

We made a documentation of the behavior for the test of repositories that you can find below.
We followed what we interpreted from the OSV schema documentation, and we would like to know whether this documentation suits it, and more precisely whether the test cases make sense to you.

test_cases_osvdev

This PR contains the implementation of a new test class impact_git_test and a tool to facilitate the creation of git repositories programmatically, the repositories created are dummy repositories, containing empty commits.

The tests implemented test the repoanalyzer class, without using the cherry-pick detection.

@CharlyReux CharlyReux marked this pull request as ready for review May 7, 2024 14:00
@another-rex another-rex requested a review from andrewpollock May 8, 2024 05:40
@andrewpollock andrewpollock requested a review from oliverchang May 8, 2024 07:10
@andrewpollock
Copy link
Contributor

/gcbrun

Copy link
Contributor

@andrewpollock andrewpollock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution, apologies for the slow initial review. My review is still ongoing, here's some initial feedback.

I'd like @oliverchang to also review this, as he's much more intimately familiar with the code being tested here, but I will give this my best shot along.

""" Utilitary class to create a test repository for the git tests
"""

class VulnerabilityType(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future-proofing and maintenance, could you try and use osv.vulnerability_pb2 and use vulnerability_pb2.Event.DESCRIPTOR.fields_by_name to introspect into the OSV record definition code, rather than having this standalone definition that could go out of sync with what's defined in

message Event {
// The earliest version/commit where this vulnerability
// was introduced in.
string introduced = 1;
// The version/commit that this vulnerability was fixed in.
string fixed = 2;
// The limit to apply to the range.
string limit = 3;
// The last affected version.
string last_affected = 4;
}

See also https://googleapis.dev/python/protobuf/latest/google/protobuf/descriptor.html



class TestRepository:
""" Utilitary class to create a test repository for the git tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix spelling mistake

Suggested change
""" Utilitary class to create a test repository for the git tests
""" Utility class to create a test repository for the git tests

shutil.rmtree(f"osv/testdata/test_repositories/{name}")
self.repo = pygit2.init_repository(
f"osv/testdata/test_repositories/{name}", bare=False)
#empty initial commit usefull for the creation of the repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix spelling mistake

Suggested change
#empty initial commit usefull for the creation of the repository
#empty initial commit useful for the creation of the repository

@@ -0,0 +1,158 @@
"""test_repository"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand this with more details, see https://google.github.io/styleguide/pyguide.html#381-docstrings

@andrewpollock
Copy link
Contributor

I note that this introduces breakage, so that will need to be resolved (perhaps the point of this PR was drive that discussion) before this can be merged successfully...

## Refactoring – `test_repository.py `
* Make the repo generation cleaner, remove unnecessary branch that were generated 
* Use protobuf type for events

## Refactoring - `impact_git_test.py`
* Move to a template architecture 
* Remove inconsistent tests
* Support cherry picking test

Note : 
`impact.py ` has been modified to handle repository that do not support caching. We are still trying to find a solution to enable the cache for synthetic repository created with pygit2.
@CharlyReux
Copy link
Author

Thank you very much for the review, and I apologize for the delay in responding. Since the initial PR commit, we have undergone some refactoring, mainly to make the code easier to understand, implement your proposed changes, and support the cherry-pick parameter to true when using RepoAnalyzer. Additionally, we reviewed the test cases we had proposed and noticed some inconsistencies, so we made slight modifications to them.

The aim of this pr is indeed to start the discussion about the possible discrepancy in the osv.dev implementation with regard to the "Open Source Vulnerability format" specification. For a research project, we implemented a test suite of the vulnerable commit identification algorithm to test our implementation. We thought that osv.dev would benefit from the same tests.

We identified discrepancies while using OSV.dev for a research project and created associated issue :

These test cases were implemented based on our interpretation of the specification. The specification seems to be ambiguous for certain situations, such as merge propagation, multiple ranges ...

We hope that the tests we implemented, might help to make the specification less ambiguous.

osv_test_cases drawio

@RomainLefeuvre
Copy link

RomainLefeuvre commented Jun 18, 2024

The objective of this PR was indeed to drive the discussion about how to test the detection of "affected git commit" and avoid ambiguity on the specification. Do not hesitate if you have any feedback.

Copy link

This pull request has not had any activity for 60 days and will be automatically closed in two weeks

@github-actions github-actions bot added the stale The issue or PR is stale and pending automated closure label Aug 17, 2024
@oliverchang oliverchang added backlog Important but currently unprioritized and removed stale The issue or PR is stale and pending automated closure labels Aug 19, 2024
Copy link

This pull request has not had any activity for 60 days and will be automatically closed in two weeks

@github-actions github-actions bot added the stale The issue or PR is stale and pending automated closure label Oct 18, 2024
@andrewpollock
Copy link
Contributor

We want to get around to spending some time on this, it's just been a case of insufficient time and higher priorities :-(

@github-actions github-actions bot removed the stale The issue or PR is stale and pending automated closure label Oct 21, 2024
@RomainLefeuvre
Copy link

I understand that GIT vulnerability algorithm is not a top priority, as most of OSVs rely on Semantic Versioning or ecosystem specific version format.

However, I'm convinced that pushing for a broader adoption of GIT RANGES should enable many analyses that are not possible with Semantic Versionning ranges but also increase the quality of vulnerable commit/release/version identification.

For a research project, we developed an algorithm that label the Software Heritage Graph using as input OSV having a git range.
If you are interested, we can discuss the different test cases and formalize on the Open Source Vulnerability format specification an evaluation algorithm adapted to GIT ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Important but currently unprioritized
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants