-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/sysroot-deploy: Add experimental support for automatic early prune #2847
Conversation
Skipping CI for Draft Pull Request. |
I still need to add tests for this. Prep patches split out in #2848. |
5f71d07
to
23bedeb
Compare
Now with a test! |
23bedeb
to
54bd002
Compare
54bd002
to
056583c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this a skim, seems sane.
…ization In the unusual case where one is manually finalizing staged deployments, as can happen in testing, we expect a successful finalization to remove the failure stamp file.
AFAICT, I don't see how `runkola.sh` or the Makefile in `tests/kolainst` can create files in `tests/kola` since it's geared towards installing under `/usr`.
When hacking and testing locally with `cosa build-fast` and `kola run`, I prefer to leave testing framework stuff within the work directory rather than installed in my pet container. Add a `localinstall` target for this which puts the tests in `tests/kola`. Then a simple `kola run` will pick it up.
If we lift it out of experimental in the future, but leave it off by Also, should we consider allowing the var to be used to turn off the
So basically if you have |
056583c
to
b159791
Compare
The idea with the
Since it's currently not the default, this would be decided at the time we do decide to flip it to be the default. We could e.g. keep the variable around and slightly tweak the existing checks. Related to this, I've updated the check so that we not only skip it if the variable is defined but empty, but also if it's set to
I didn't want to dig too much into this in the commit message because indeed it's a bit subtle and would make it even longer. I linked to #2670 (comment) instead which with the following comments should help I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor changes
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_ENABLE_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning (see [[2]] and following). This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 [2]: ostreedev#2670 (comment) Closes: ostreedev#2670
b159791
to
c561e61
Compare
Updated for comments! |
I forgot to update the commit message on this to reference the new environment variable knob ( |
During the early design of FCOS and RHCOS, we chose a value of 384M
for the boot partition. This turned out to be too small: some arches
other than x86_64 have larger initrds, kernel binaries, or additional
artifacts (like device tree blobs). We'll likely bump the boot partition
size in the future, but we don't want to abandon all the nodes deployed
with the current size.[1]
Because stale entries in
/boot
are cleaned up after new entries arewritten, there is a window in the update process during which the bootfs
temporarily must host all the
(kernel, initrd)
pairs for the union ofcurrent and new deployments.
This patch determines if the bootfs is capable of holding all the
pairs. If it can't but it could hold all the pairs from just the new
deployments, the outgoing deployments (e.g. rollbacks) are deleted
before new deployments are written. This is done by updating the
bootloader in two steps to maintain atomicity.
Since this is a lot of new logic in an important section of the
code, this feature is gated for now behind an environment variable
(
OSTREE_SYSROOT_OPTS=early-prune
). Once we gain more experience with it,we can consider turning it on by default.
This strategy increases the fallibility of the update system since one
would no longer be able to rollback to the previous deployment if a bug
is present in the bootloader update logic after auto-pruning. This is
however mitigated by the fact that the heuristic is opportunistic: the
rollback is pruned only if it's the only way for the system to update.
Closes: #2670