lib/sysroot-deploy: Add experimental support for automatic early prune #2847

jlebon · 2023-04-13T21:26:54Z

During the early design of FCOS and RHCOS, we chose a value of 384M
for the boot partition. This turned out to be too small: some arches
other than x86_64 have larger initrds, kernel binaries, or additional
artifacts (like device tree blobs). We'll likely bump the boot partition
size in the future, but we don't want to abandon all the nodes deployed
with the current size.[1]

Because stale entries in /boot are cleaned up after new entries are
written, there is a window in the update process during which the bootfs
temporarily must host all the (kernel, initrd) pairs for the union of
current and new deployments.

This patch determines if the bootfs is capable of holding all the
pairs. If it can't but it could hold all the pairs from just the new
deployments, the outgoing deployments (e.g. rollbacks) are deleted
before new deployments are written. This is done by updating the
bootloader in two steps to maintain atomicity.

Since this is a lot of new logic in an important section of the
code, this feature is gated for now behind an environment variable
(OSTREE_SYSROOT_OPTS=early-prune). Once we gain more experience with it,
we can consider turning it on by default.

This strategy increases the fallibility of the update system since one
would no longer be able to rollback to the previous deployment if a bug
is present in the bootloader update logic after auto-pruning. This is
however mitigated by the fact that the heuristic is opportunistic: the
rollback is pruned only if it's the only way for the system to update.

Closes: #2670

openshift-ci · 2023-04-13T21:26:58Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

jlebon · 2023-04-13T21:28:12Z

I still need to add tests for this. Prep patches split out in #2848.

jlebon · 2023-04-14T16:25:10Z

Now with a test!

cgwalters

Gave this a skim, seems sane.

src/libostree/ostree-sysroot-deploy.c

…ization In the unusual case where one is manually finalizing staged deployments, as can happen in testing, we expect a successful finalization to remove the failure stamp file.

AFAICT, I don't see how `runkola.sh` or the Makefile in `tests/kolainst` can create files in `tests/kola` since it's geared towards installing under `/usr`.

When hacking and testing locally with `cosa build-fast` and `kola run`, I prefer to leave testing framework stuff within the work directory rather than installed in my pet container. Add a `localinstall` target for this which puts the tests in `tests/kola`. Then a simple `kola run` will pick it up.

dustymabe · 2023-04-14T19:30:14Z

During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[1]

Because stale entries in /boot are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the (kernel, initrd) pairs for the union of current and new deployments.

This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted before new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity.

Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (OSTREE_EXP_AUTO_EARLY_PRUNE). Once we gain more experience with it, we can consider turning it on by default.

If we lift it out of experimental in the future, but leave it off by
default should we consider removing EXP from the name of the var
here?

Also, should we consider allowing the var to be used to turn off the
behavior (i.e. if we flip it to the the default in the future and
someone finds they don't want the behavior).

This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned only if it's the only way for the system to update.

So basically if you have X,Y -> Z and there is a logic problem
in Y you can't get back to X in order to update to Z? Your words
are a bit hard to decipher but this simple example makes it clear to me
(assuming the simple example is correct).

jlebon · 2023-04-14T19:54:27Z

If we lift it out of experimental in the future, but leave it off by default should we consider removing EXP from the name of the var here?

The idea with the EXP variable is that it's purposely temporary. If it's stabilized but left off by default, I think we'd want this controlled by e.g. a repo config knob instead. That said, I've removed the EXP wording since it could make sense even long-term to have a variable override for it to get out of a pickle (given that e.g. repo configs are unmanaged, but systemd dropins aren't).

Also, should we consider allowing the var to be used to turn off the behavior (i.e. if we flip it to the the default in the future and someone finds they don't want the behavior).

Since it's currently not the default, this would be decided at the time we do decide to flip it to be the default. We could e.g. keep the variable around and slightly tweak the existing checks. Related to this, I've updated the check so that we not only skip it if the variable is defined but empty, but also if it's set to 0.

So basically if you have X,Y -> Z and there is a logic problem in Y you can't get back to X in order to update to Z? Your words are a bit hard to decipher but this simple example makes it clear to me (assuming the simple example is correct).

I didn't want to dig too much into this in the commit message because indeed it's a bit subtle and would make it even longer. I linked to #2670 (comment) instead which with the following comments should help I think.

src/libostree/ostree-sysroot-deploy.c

cgwalters

A few minor changes

During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_ENABLE_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning (see [[2]] and following). This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 [2]: ostreedev#2670 (comment) Closes: ostreedev#2670

jlebon · 2023-05-01T16:12:51Z

Updated for comments!

jlebon · 2023-05-26T14:50:07Z

I forgot to update the commit message on this to reference the new environment variable knob (OSTREE_SYSROOT_OPTS=early-prune) rather than the initial OSTREE_ENABLE_AUTO_EARLY_PRUNE one. I've updated the description, but it'll forever be in git history, lying to readers... 😢

openshift-ci bot added the do-not-merge/work-in-progress label Apr 13, 2023

jlebon force-pushed the pr/calculate-and-cleanup branch 2 times, most recently from 5f71d07 to 23bedeb Compare April 14, 2023 16:24

jlebon marked this pull request as ready for review April 14, 2023 16:24

openshift-ci bot removed the do-not-merge/work-in-progress label Apr 14, 2023

jlebon force-pushed the pr/calculate-and-cleanup branch from 23bedeb to 54bd002 Compare April 14, 2023 16:47

jlebon mentioned this pull request Apr 14, 2023

Add back Qualcomm DTBs on aarch64 coreos/fedora-coreos-tracker#1467

Closed

jlebon force-pushed the pr/calculate-and-cleanup branch from 54bd002 to 056583c Compare April 14, 2023 17:10

cgwalters reviewed Apr 14, 2023

View reviewed changes

src/libostree/ostree-sysroot-deploy.c Show resolved Hide resolved

src/libostree/ostree-sysroot-deploy.c Show resolved Hide resolved

jlebon added 3 commits April 14, 2023 15:19

lib/sysroot-deploy: Nuke finalize-failure.stamp on successful final…

45772ed

…ization In the unusual case where one is manually finalizing staged deployments, as can happen in testing, we expect a successful finalization to remove the failure stamp file.

tests/kola: delete unused .gitignore

771deb5

AFAICT, I don't see how `runkola.sh` or the Makefile in `tests/kolainst` can create files in `tests/kola` since it's geared towards installing under `/usr`.

jlebon force-pushed the pr/calculate-and-cleanup branch from 056583c to b159791 Compare April 14, 2023 19:43

dustymabe mentioned this pull request Apr 19, 2023

Boot partition can easily run out of space on upgrade coreos/fedora-coreos-tracker#1247

Closed

cgwalters reviewed Apr 26, 2023

View reviewed changes

src/libostree/ostree-sysroot-deploy.c Show resolved Hide resolved

src/libostree/ostree-sysroot-deploy.c Outdated Show resolved Hide resolved

src/libostree/ostree-sysroot-deploy.c Outdated Show resolved Hide resolved

src/libostree/ostree-sysroot-deploy.c Outdated Show resolved Hide resolved

cgwalters requested changes Apr 30, 2023

View reviewed changes

jlebon force-pushed the pr/calculate-and-cleanup branch from b159791 to c561e61 Compare May 1, 2023 16:12

cgwalters approved these changes May 1, 2023

View reviewed changes

cgwalters merged commit 919212d into ostreedev:main May 1, 2023

jlebon deleted the pr/calculate-and-cleanup branch May 1, 2023 19:08

cgwalters mentioned this pull request May 2, 2023

[WIP] lib/deploy: Check space before copying into /boot #1830

Closed

jlebon mentioned this pull request May 17, 2023

Enable libostree automatic early pruning coreos/fedora-coreos-tracker#1495

Closed

dustymabe mentioned this pull request May 26, 2023

filesystem space checks for /boot/ #1648

Open

jlebon mentioned this pull request May 26, 2023

selectively enable ostree autopruning coreos/fedora-coreos-config#2435

Merged

dustymabe mentioned this pull request Jun 1, 2023

auto prune fails for simple operations with fallocate EINVAL #2869

Closed

dustymabe mentioned this pull request Dec 19, 2023

aarch64 failing to upgrade with /boot filesystem full coreos/fedora-coreos-tracker#1637

Closed

edcdavid mentioned this pull request May 21, 2024

Add support automatic Disk encryption (Luks) protected with PCR8 coreos/fedora-coreos-tracker#1737

Open

dustymabe mentioned this pull request Oct 31, 2024

ostree-finalize-staged.service times out on slow hardware coreos/fedora-coreos-tracker#1824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib/sysroot-deploy: Add experimental support for automatic early prune #2847

lib/sysroot-deploy: Add experimental support for automatic early prune #2847

jlebon commented Apr 13, 2023 •

edited

Loading

openshift-ci bot commented Apr 13, 2023

jlebon commented Apr 13, 2023

jlebon commented Apr 14, 2023

cgwalters left a comment

dustymabe commented Apr 14, 2023

jlebon commented Apr 14, 2023

cgwalters left a comment

jlebon commented May 1, 2023

jlebon commented May 26, 2023

lib/sysroot-deploy: Add experimental support for automatic early prune #2847

lib/sysroot-deploy: Add experimental support for automatic early prune #2847

Conversation

jlebon commented Apr 13, 2023 • edited Loading

openshift-ci bot commented Apr 13, 2023

jlebon commented Apr 13, 2023

jlebon commented Apr 14, 2023

cgwalters left a comment

Choose a reason for hiding this comment

dustymabe commented Apr 14, 2023

jlebon commented Apr 14, 2023

cgwalters left a comment

Choose a reason for hiding this comment

jlebon commented May 1, 2023

jlebon commented May 26, 2023

jlebon commented Apr 13, 2023 •

edited

Loading