-
Notifications
You must be signed in to change notification settings - Fork 27
container: Add deployed commits into set of GC roots #404
Conversation
c8d4c24
to
7e071e9
Compare
Prep for handling image pruning better. The way things are kind of expected to work today is that for a deployed ostree commit, we have *two* refs which point to it - one like e.g. `fedora:fedora/x86_64/coreos/stable`, as well as the "deployment ref" like "ostree/0/1/1" which is a synthetic ref generated by the sysroot core. We want to be able to remove the container image refs - but doing so today subjects the *layer* branches to garbage collection. Fix this by looking at the deployment refs as well as the set of images when computing the set of references for container images.
7e071e9
to
aec69c8
Compare
@@ -983,6 +983,40 @@ pub async fn copy( | |||
Ok(()) | |||
} | |||
|
|||
/// Iterate over deployment commits, returning the manifests from | |||
/// commits which point to a container image. | |||
fn list_container_deployment_manifests( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. However I have a general observation and alternative suggestion on GC logic.
I have a feeling that we should try to rework all the code behind gc_image_layers()
so that it becomes infallible (or at least provide an infallible version to consumers). The idea is that a minor unexpected state or a single error in the GC logic can quickly spiral down into a instance with a full sysroot and hard to recover.
To that extent, it is probably more useful to keep going cleaning up whatever we can without failing, and log error messages if we encounter any failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I see your point. Arguably we should do the same inside ostree_repo_prune()
for the objects right?
I think the most cases of failures here are going to be filesystem corruption...in which case, trying to continue probably won't help much.
But in the case of a bug in the logic where we too-eagerly pruned a layer in earlier code; yeah I'd agree continuing makes sense because the user can always re-pull that layer. This really gets into the need for a similar ostree fsck --repair
type logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look at this as a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK one thing I did verify is that the key bit of repo.set_ref_immediate(None, layer_ref.as_str(), None, cancellable)?;
is already idempotent. So I think that addresses a major source of possible logic errors already.
This needs some thought; filed #407
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I guess the main part of this would be on objects pruning instead, I agree.
Overall on GC there are many little things that could be just barely misaligned enough to cascade into larger issues. So it seems a good idea to at least trying to keep going whenever possible and clean at least some of the things.
Prep for handling image pruning better. The way things are kind of expected to work today is that for a deployed ostree commit, we have two refs which point to it - one like e.g.
fedora:fedora/x86_64/coreos/stable
, as well as the "deployment ref" like "ostree/0/1/1" which is a synthetic ref generated by the sysroot core.We want to be able to remove the container image refs - but doing so today subjects the layer branches to garbage collection.
Fix this by looking at the deployment refs as well as the set of images when computing the set of references for container images.