Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify crates.io API caching #631

Merged
merged 1 commit into from
Sep 12, 2024
Merged

Conversation

mystor
Copy link
Collaborator

@mystor mystor commented Sep 3, 2024

Previously, we would perform time-based caching for data from the crates.io API in order to avoid unnecessary load on it. Unfortunately, this would sometimes lead to situations such as #629 where the cache is out-of-date and confusing errors are produced.

Since the last time we looked at this, the API guidance for the sparse HTTP index has been clarified, explicitly having "no rate limit". This allows us to simplify the caching code dramatically, caching only the data from the rate-limited crates.io API, and always checking the sparse HTTP index to see if new versions have been published.

The new caching system no longer needs expiry times, as it will be updated consistently whenever a new version is published to the crates.io index.

After this change, we will make more network requests than before when running cargo-vet to check the index.

@mystor mystor requested a review from afranchuk September 3, 2024 19:49
@mystor
Copy link
Collaborator Author

mystor commented Sep 3, 2024

Note that this change also starts collecting the sha256 hashes for versions when fetching from the index. I haven't implemented actually validating these when downloading yet (though that should be done at some point, and likely won't be too difficult, beyond requiring adding & auditing new dependencies).

Copy link
Collaborator

@afranchuk afranchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall!

src/storage.rs Outdated
}
let mut result = SortedMap::new();
for version_entry in crate_entry.versions() {
if let Ok(version) = semver::Version::parse(version_entry.version()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe log this error? It should be unlikely, though.

Previously, we would perform time-based caching for data from the
crates.io API in order to avoid unnecessary load on it. Unfortunately,
this would sometimes lead to situations such as mozilla#629 where the cache is
out-of-date and confusing errors are produced.

Since the last time we looked at this, the API guidance for the sparse
HTTP index has been clarified, explicitly having "no rate limit". This
allows us to simplify the caching code dramatically, caching only the
data from the rate-limited crates.io API, and always checking the sparse
HTTP index to see if new versions have been published.

The new caching system no longer needs expiry times, as it will be
updated consistently whenever a new version is published to the
crates.io index.

After this change, we will make more network requests than before when
running cargo-vet to check the index.
@mystor mystor force-pushed the simplify_crates_caching branch from 9887560 to b8500de Compare September 12, 2024 19:19
@mystor mystor merged commit 3671ca7 into mozilla:main Sep 12, 2024
13 checks passed
mystor added a commit to mystor/cargo-vet that referenced this pull request Sep 18, 2024
These #[instrument] directives were added in mozilla#631, and end up logging
error messages for every crate which isn't on crates.io due to
CrateInfoError::DoesNotExist.

Remove these instrument lines to avoid the unnecessary error spam.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants