Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oci image deletion #2003

Merged
merged 1 commit into from
Oct 12, 2023
Merged

Oci image deletion #2003

merged 1 commit into from
Oct 12, 2023

Conversation

Pvlerick
Copy link
Contributor

Fix attempt for #1812

I made two commit; first a failing test then the feature to make it work, I'll happily fixup in one commit given how small this change is.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

At a first glance, this seems way too small to plausibly be a comprehensive implementation.

  • Most importantly, a single oci: directory can house multiple images. So just removing index.json is deleting much more than requested.

  • Every image contains at least one blob; this is not deleting any blobs at all, AFAICS.

    • Combined with the above, only blobs that are not used by other images can be deleted.
  • Typically the second os.Remove would just fail, because the directory is not empty. So AFAICS the proposed DeleteImage implementation must always fail in practice.

    (Yes, it does not fail in the fake refToTempOCI repo used in tests, but that mostly exists to exercise the name lookup and metadata operations, so it is not representative.)

oci/layout/oci_transport.go Outdated Show resolved Hide resolved
@mtrmac
Copy link
Collaborator

mtrmac commented Jun 19, 2023

I made two commit; first a failing test then the feature to make it work, I'll happily fixup in one commit given how small this change is.

We prefer the tests to be working on every commit, so that bisection doesn’t have false positives.

@Pvlerick
Copy link
Contributor Author

I made two commit; first a failing test then the feature to make it work, I'll happily fixup in one commit given how small this change is.

We prefer the tests to be working on every commit, so that bisection doesn’t have false positives.

All right, I'm doing this out of habit (failing tests first then committing) but I'll squash all of that for next time.

@Pvlerick Pvlerick marked this pull request as draft June 20, 2023 11:33
@Pvlerick
Copy link
Contributor Author

Thanks for the feedback, I'll get back to this and will ping you once I have something more substantial.

@mtrmac
Copy link
Collaborator

mtrmac commented Jun 20, 2023

All right, I'm doing this out of habit (failing tests first then committing) but I'll squash all of that for next time.

That habit makes sense — we really only need it squashed (or just reordered?) immediately before merging.

@Pvlerick Pvlerick force-pushed the oci-image-deletion branch from ce63a70 to 9d5e12f Compare June 22, 2023 07:00
@Pvlerick
Copy link
Contributor Author

Pvlerick commented Jun 22, 2023

Here's something hopefully better. Test setup is ugly but I'd like to get the logic validated before refactoring this into something cleaner.

A couple of questions on this:

  • if it's the last image in the index, should it delete the index file altogether? And all sub-folders, which should all be empty at that point?
  • I don't think it's possible to check if there is a container running using that image, is it? That should be a test case, but I don't see how it would be possible... Multiple OCI runtime could be running in different locations, etc... Hairy.
  • if the given image is empty, should it 1) if there is only one image, delete it from the index (and see point above) 2) if more than one image, do nothing - see getManifestDescriptor behavior: https://github.com/containers/image/blob/main/oci/layout/oci_transport.go#L183

Thanks :-)

@Pvlerick Pvlerick requested a review from mtrmac June 22, 2023 12:45
@Pvlerick Pvlerick force-pushed the oci-image-deletion branch from 9d5e12f to fef8879 Compare June 22, 2023 12:56
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a move in the right direction.

if it's the last image in the index, should it delete the index file altogether? And all sub-folders, which should all be empty at that point?

I think keeping the index and other structures around makes sense — if someone cares about the state of the directory to delete an individual image and not to delete everything, that user might plausibly want to delete all existing images and replace them with other images.

I don't think it's possible to check if there is a container running using that image, is it?

This code does not really have a concept of containers or any kind of lock / external reference counting; it’s a pure store, and it’s up to the caller not to delete things that will need to be used.

if the given image is empty

Assuming this refers to ref.image, I would expect this to follow the existing getManifestDescriptor behavior: If the users’ input if ambiguous, fail and don’t delete anything. If ref.image == "" in a situation where that is documented to clearly identify one image, deleting that one image makes sense to me (though the situation is a bit suspect, why would anyone do that?).

oci/layout/oci_transport.go Outdated Show resolved Hide resolved
oci/layout/oci_transport.go Outdated Show resolved Hide resolved
@Pvlerick Pvlerick force-pushed the oci-image-deletion branch from 399f4c3 to 75a2d40 Compare June 23, 2023 12:00
@Pvlerick
Copy link
Contributor Author

Style/opinion question: index files are generated using string literals: https://github.com/containers/image/blob/main/oci/layout/oci_transport_test.go#L201

I started refactoring this part for my tests to use imgspecv1 structs that I'm filling then serializing instead; is that ok or do you rather keep literals?

@mtrmac
Copy link
Collaborator

mtrmac commented Jun 23, 2023

I started refactoring this part for my tests to use imgspecv1 structs that I'm filling then serializing instead

I think using raw blobs works better as an interoperability test; that way we know if the (de)serialization code changes in a way that breaks compatibility.

(That said, using raw files in testdata would be cleaner than long string literals like that.)


Refactors are fine and welcome, but please keep any larger ones as separate commits to be clear which parts of the PR are intended to do something new and which are intended not to be a change.

@Pvlerick
Copy link
Contributor Author

I think using raw blobs works better as an interoperability test; that way we know if the (de)serialization code changes in a way that breaks compatibility.

(That said, using raw files in testdata would be cleaner than long string literals like that.)

Now I'm on the fence; it's true that literals have the advantage of indirectly testing the spec, which is always a plus. I'll commit what I have now so you can have a look then I'll make the changes, that's not a big deal.

@Pvlerick Pvlerick requested a review from mtrmac June 23, 2023 19:38
@Pvlerick
Copy link
Contributor Author

Pvlerick commented Jun 23, 2023

I'm fairly happy with what's in there now; test data generation is much clearer (although it uses the spec library, I'll change that to literals as soon as time permits) and allows more flexibility as demonstrated with the (failing fort now) test with a layer shared by two images.

(Looks like I'll need to rebase on main too)

@Pvlerick Pvlerick force-pushed the oci-image-deletion branch 2 times, most recently from 1fb14c8 to 121e1f3 Compare June 24, 2023 13:27
@Pvlerick
Copy link
Contributor Author

All right, all test are passing now :-)

@Pvlerick Pvlerick force-pushed the oci-image-deletion branch from 121e1f3 to fe1650d Compare June 26, 2023 06:35
@Pvlerick Pvlerick marked this pull request as ready for review June 26, 2023 06:36
@Pvlerick
Copy link
Contributor Author

A couple of question on top of the review:

  • Would it make sense to add some logging (debug or trace) such as "deleting blob xxx" ?
  • Would it make sense the parallelize deletion of the blobs?

Thanks

@mtrmac
Copy link
Collaborator

mtrmac commented Jun 28, 2023

I’m afraid I didn’t get to review the full PR yet

  • Would it make sense to add some logging (debug or trace) such as "deleting blob xxx" ?

Sure; that seems useful, and can’t hurt anything.

  • Would it make sense the parallelize deletion of the blobs?

You know why you are doing this work and whether it is valuable for you. My intuition is that most archives are pretty small, and that the extra complexity and risk are not at all worth it, but that’s a guess with no data.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not a full review, just a very quick skim.)

oci/layout/oci_transport.go Outdated Show resolved Hide resolved
if err != nil {
return err
}
for _, layer := range otherImageManifest.Layers {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The manifests themselves are also stored as blobs.
  • Config.Digest also needs to be accounted for.
  • IIRC it’s possible for an entry in the top-level index to be an imgspecv1.Index again — at least this code does that to one level of nesting, the spec seems to suggest an arbitrary depth.

With OCI artifacts storing ~arbitrary data, it seems quite plausible for an entry in Layers refer to the same blob as a config/manifest/index of another image/artifact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right, there can be nested indexes: https://github.com/opencontainers/image-spec/blob/main/image-index.md?plain=1#L46 - that's gonna be interesting...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac but this also means that getManifestDescriptor (https://github.com/containers/image/blob/main/oci/layout/oci_transport.go#L176) is broken in the case of nested indexes, it only looks in the index.json (https://github.com/containers/image/blob/main/oci/layout/oci_transport.go#L240) unless I'm missing a piece of the puzzle (nested indexes are stored as blobs as far as I understand the convoluted spec)

I've started working on this by changing the tests setup (nothing can't be fixed with recursion), but supporting this means that I also need to start scanning indexes recursively. It will complexifies this PR a bit more - what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec says implementation should support this: https://github.com/opencontainers/image-spec/blob/main/image-index.md?plain=1#L44

It also means that the WIP you mentioned earlier about returning an index when calling getManifestDescriptor which would be impossible if the image is in a nested index, or that'll be the index in the index it was found, not necessarily index.json

My head starts to hurt :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(My personal opinion is that the nested indexes provide too much flexibility and the semantics are unclear for consumers, so I’m not that much of a fan. But the feature exists.)

There’s a bit of nuance in here: if the repo contains a nested index, it’s not ideal, but also not a disaster when this code does not support that in ImageSource / getManifestDescriptor; the code would just “safely” fail and users would have to use some other tool.

OTOH if the deletion code does not handle nested indices, that could lead to data loss, so that is a more urgent concern.

So one approach would be to recursively enumerate the used blobs; another, I think quite reasonable, possibility would be to just fail when a nested index is encountered, and to tell the user to use another tool — that’s analogous to the ImageSource failure mode, but still safe.


Note that the above applies to “deeply nested” OCI indexes. One level of nesting is supported, to represent a multi-arch image. See the outcome of

skopeo --insecure-policy copy --all docker://quay.io/libpod/alpine oci:$dest

So… I haven’t actually written the code, but it seems to me that a full recursion (or possibly a “queue of blobs+blob types to scan”) support might be the easiest way to implement all of this anyway, instead of somehow special-casing the “one level of nesting” case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that handling this properly in the deletion has to be extra safe.
The good news is that I now have a way to cleanly create nested indexes for tests so this can be tested properly (which uses recursion so in for a penny, in for a pound).

Thanks for the skopeo example, I'll have a look at how this is handled, that will surely be instructive.

Copy link
Contributor Author

@Pvlerick Pvlerick Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been inspecting the result from the skopeo command and it has been very educating :-)

One thing I noticed, reading getManifestDescriptor, is that you can mess up your registry if you do something like this:

skopeo copy docker://quay.io/libpod/alpine:3.2 oci:/tmp/oci-registry/alpine
skopeo copy docker://quay.io/libpod/alpine:3.10.2 oci:/tmp/oci-registry/alpine

The index.json file will then contain two manifests (image manifests or sub indexes if you used the --all) which don't have any annotations (so no org.opencontainers.image.ref.name).

At that point, that directory is toast, it seems:

skopeo copy oci:/tmp/oci-registry/alpine:3.2 oci:/tmp/oci-registry/alpine-copy 
FATA[0000] initializing source oci:/tmp/oci-registry/alpine:3.2: no descriptor found for reference "3.2" 
skopeo copy oci:/tmp/oci-registry/alpine oci:/tmp/oci-registry/alpine-copy 
FATA[0000] initializing source oci:/tmp/oci-registry/alpine:: more than one image in oci, choose an image

Would this be worthy of being reported as a bug?
Sorry for the digression, I'm still in the learning phase :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point, that directory is toast, it seems:

It’s… a valid OCI structure. And the WIP oci:/tmp/oci-registry/alpine:@0 syntax will eventually allow c/image to consume it.

I agree it’s very awkward that c/image can produce a structure that it can’t read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to that WIP with the index? It could be useful, for me, to have a look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s #1381 / #1677 . Out of date for a while, I’m afraid.

oci/layout/oci_transport.go Outdated Show resolved Hide resolved
@Pvlerick Pvlerick force-pushed the oci-image-deletion branch from fe1650d to d3e5d57 Compare June 29, 2023 10:42
@mtrmac mtrmac added the kind/feature A request for, or a PR adding, new functionality label Jun 30, 2023
@Pvlerick Pvlerick force-pushed the oci-image-deletion branch 2 times, most recently from dff7d6f to e5aed42 Compare July 6, 2023 09:20
@Pvlerick
Copy link
Contributor Author

Pvlerick commented Jul 6, 2023

@mtrmac I made quite a few changes:

  • used fixtures for the tests, it's a bit more static and less readable (you have to go read the fixture to find out what's the expected result), but it tests the spec package indirectly as we discussed previously;
  • I addressed both of your comments: used a set and checked for configuration shared use (and added a test case for it in the shared blobs test

I started working on the nested index stuff in another branch, but it's not over yet (although the fixture is there already); there is also the extra complexity that if something changes in the nested index, its sha256 will change too, so all the "root" index will have to be changed too.

@Pvlerick Pvlerick force-pushed the oci-image-deletion branch 2 times, most recently from 4c59a25 to 0c688f4 Compare September 4, 2023 13:41
@Pvlerick Pvlerick force-pushed the oci-image-deletion branch 3 times, most recently from 5a9d2f8 to 62296c6 Compare September 28, 2023 09:46
@Pvlerick Pvlerick requested a review from mtrmac September 28, 2023 11:04
@Pvlerick
Copy link
Contributor Author

@mtrmac I finally had time to take a second shot a this.

Going back and forth with tests, I ended up with a different design.
In essence, blobs used in a descriptor in the index.json (descriptor being an image manifest or a nested index) are collected in a map which holds the usage count for each of these blobs (getBlobsUsedInManifest or getBlobsUsedInIndex). It is then compared to a similar map generated from index.json itself, and blobs that usage count reach 0 in the former map are maked for deletion (getBlobsToDelete). Blobs are then deleted and the relevant entry in the index is then deleted too.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry about the late review.

This is great work, just two outstanding concerns:

  • I’m quite worried about the code to finally delete the index entry getting out of sync
  • The handling of shared blob directories can be simplified.

oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete_test.go Outdated Show resolved Hide resolved
@Pvlerick Pvlerick force-pushed the oci-image-deletion branch 6 times, most recently from a9bf683 to 2807bce Compare October 10, 2023 12:57
oci/layout/oci_transport_test.go Outdated Show resolved Hide resolved
oci/layout/oci_transport_test.go Outdated Show resolved Hide resolved
oci/layout/oci_transport_test.go Outdated Show resolved Hide resolved
oci/layout/oci_transport_test.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/oci_delete.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

  • The shared blobs dir conversation continues in Oci image deletion #2003 (comment) , I think I’m converging much closer to your point
  • A bit more for the top-level index deletion, please
  • AFAICS the updated test for getManifestDescriptor is incorrect in a rather misleading way.

@Pvlerick
Copy link
Contributor Author

  • AFAICS the updated test for getManifestDescriptor is incorrect in a rather misleading way.

Again, sorry about that :-(

@Pvlerick Pvlerick requested a review from mtrmac October 11, 2023 09:30
@mtrmac
Copy link
Collaborator

mtrmac commented Oct 11, 2023

  • AFAICS the updated test for getManifestDescriptor is incorrect in a rather misleading way.

Again, sorry about that :-(

No worries — I have written a much worse bug yesterday. I just wanted to highlight that as more important than some of the stylistic comments.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

Could you squash this to one or two commits, please?

oci/layout/oci_delete.go Outdated Show resolved Hide resolved
oci/layout/fixtures/manifest/index.json Show resolved Hide resolved
Signed-off-by: Philippe Vlérick <[email protected]>
@Pvlerick
Copy link
Contributor Author

Rebased in one commit (and on the latest main)

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!

@mtrmac mtrmac merged commit ec86902 into containers:main Oct 12, 2023
9 checks passed
@Pvlerick Pvlerick deleted the oci-image-deletion branch October 13, 2023 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for, or a PR adding, new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants