Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-5.32 Zstd backports #2507

Closed
wants to merge 27 commits into from
Closed

release-5.32 Zstd backports #2507

wants to merge 27 commits into from

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Aug 7, 2024

This is #2321 + #2503 + #2487 , backported to a newly-created release-5.32 branch, along with a release bump to 5.32.1.

Alternatively, we could just tag main as a 5.32.1 without backporting: see https://github.com/containers/image/compare/main..mtrmac:5.32-zstd , the difference is a set of typo fixes and dependency updates. The dependency updates are probably unwanted, I’m not sure.

Cc: @TomSweeneyRedHat

mtrmac added 27 commits August 7, 2024 18:16
Signed-off-by: Miloslav Trmač <[email protected]>
We are already calling m.LayerInfos() anyway, so there is ~no
extra cost. And using LayerInfos means we don't need to worry
about reversing the order of layers, and we will have access
to the layer index, allowing us to acccess the indexTo* fields
in the future.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
- Don't claim that we only use compressed digests.
- Explicitly document that we assume TOC digests to be unambiguous
- Actually define the term "DiffID".
- Be more precise in computeID about the criteria being layer identity,
  not where we pull the layer from.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Some errors are severe enough that just logging and continuing is
not really worthwhile.

Signed-off-by: Miloslav Trmač <[email protected]>
…tyDataLocked

Currrently we "only" have indexToTOCDigest and blobDiffIDs, but we will make this
more complex.

Centralizing the consumption of these fields into trustedLayerIdentityDataLocked
ensure that all consumers interpret the data exactly consistently (and it also
allows us to use a single "trusted" variable instead of 2/3 individual ones).

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
The new code is not called, so it should not change behavior
(apart from extending the BoltDB/SQLite schema).

Signed-off-by: Miloslav Trmač <[email protected]>
…storage by DiffID

If we can, prefer identifying layers by DiffID, because multiple TOCs can map to the
same DiffID; and because it maximizes reuse with non-TOC layers.

For now, the new situation is unreachable.

Signed-off-by: Miloslav Trmač <[email protected]>
We will add one more instance of this, so share the code.

Should not change behavior (it does remove one unreachable code path).

Signed-off-by: Miloslav Trmač <[email protected]>
… is known

- Multiple TOC values might correspond to a single DiffID (e.g. if different
  compression levels are used); try to share them all, identified by DiffID
  (so that we also reuse with non-TOC pulls).
  - LayersByTOCDigest only uses a single TOC digest per layer; BlobInfoCache
    allows multiple matches, matches layers which have been since deleted,
    and potentially matches TOC digests which we have created by pushing
    but haven't pulled yet.
- On reuse, we can now use DiffID-based layer identities even if the reuse
  was TOC~driven.

Signed-off-by: Miloslav Trmač <[email protected]>
…ayers

- Rely on it instead of triggering the "untrusted DiffID" logic
- Also propagate it to storage

Signed-off-by: Miloslav Trmač <[email protected]>
…er match

The rules expect us to set manifest editing updates.

Signed-off-by: Miloslav Trmač <[email protected]>
... and add CandidateWithLocation and CandidateWithUnknownLocation ,
so that the BIC implementations only need to deal with one value
instead of carrying around three; we will want to add one more,
and encapsulating them all into a single template will make it
transparent to the cache implementations.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... just because we now can, and to nudge all future caches to be designed
around CandidateTemplateWithCompression.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will add more logic to the default case, so sharing the
CandidateCompressionMatchesReuseConditions call is not going to be
as easy. Split the two code paths.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will want to record more than a single alghoritm name. For now,
just introduce the structure and modify users, we'll add the new fields
later.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
… blobs

... because we don't trust the TOC data, if any.

This allows us to remove the zstd:chunked hack; we, at least,
now record those blobs as zstd.

Signed-off-by: Miloslav Trmač <[email protected]>
If we don't know an uncompressed digest, don't try using "".

Signed-off-by: Miloslav Trmač <[email protected]>
Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
The cache implementations are recording both the base and specific compression variant;
CandidateLocations2 all call CandidateTemplateWithCompression to choose the
appropriate variants to return based on CandidateLocations2Options.

This way, neither the BIC implementations nor the transports are not responsible for
converting zstd:chunked entries to zstd entries if the user wants the latter.

Signed-off-by: Miloslav Trmač <[email protected]>
Introduce distinct uploadedCompressorBaseVariantName and
uploadedCompressorSpecificVariantName fields; that way we now never
call RecordDigestCompressorData with inconsistent zstd / zstd:chunked in one field,
so we can always record data when we see, or create, a zstd:chunked layer,
removing the current hack.

Signed-off-by: Miloslav Trmač <[email protected]>
- Add a CompressionAnnotations field
- Allow turning a known-zstd blob into a zstd:chunked one if we
  know the right annotations

This just adds the fields, nothing sets them yet, should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
- Return the required annotations, if we have them
- If we have a zstd blob and the BIC contains the annotations,
  we don't check for the blob's presence initially. In that case,
  don't skip it if we find the BIC annotations.

Signed-off-by: Miloslav Trmač <[email protected]>
... instead of only treating it as zstd.

Signed-off-by: Miloslav Trmač <[email protected]>
Signed-off-by: Miloslav Trmač <[email protected]>
Signed-off-by: Miloslav Trmač <[email protected]>
@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 7, 2024

Wrong target, I’ll re-create the PR.

@mtrmac mtrmac closed this Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant