-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split RHCOS into layers #1637
Split RHCOS into layers #1637
Conversation
Skipping CI for Draft Pull Request. |
067ece5
to
f79684b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work on this!
openshift/kubernetes has a specific workflow where jobs will build a new kubelet to use during the job run. This helps with rebase work and validating new kubernetes versions coming into OpenShift. We should preserve this workflow when migrating to RHCOS layering. /cc @soltysh |
I don't expect any issues there. That workflow should keep working as is. |
f79684b
to
a6a7438
Compare
/cc @cybertron @andfasano |
I believe this was the pre-req work done in openshift/kubernetes#1805, which ensured we won't have problems in o/k. |
OK, so let's resume the bootstrapping issue. Restating some of the things from above and from researching further:
What I'm playing with now is basically to have a special This is in effect like a more aggressive WIP for this in openshift/installer#8742. |
@jlebon That sounds like it might work. Where will the Kubelet be coming from? An OpenShift built image? |
Won't doing |
From the node image (i.e. for OCP, the
No. The system boots into |
Via a generator overriding |
"Open Questions" below). | ||
- Work with ART to adjust how they determine if the RHCOS images are up to date. | ||
Currently, they expect both RHEL and OCP content in those images, but that content will | ||
be split across separate images now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't the node image built on top of the base image have all the contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would. I'm not sure on the details there of how the ART machinery works, but my understanding is that RHCOS builds currently get triggered as part of a larger pipeline that also builds e.g. the kubelet RPM. But with this change, there's no point in rebuilding RHCOS if just OCP packages changed. Instead, a new layered build should happen.
Left some nits but looking good overall. |
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, promote this image as `node`. In the future, when we're ready to switch CI over to the node image, it'll take the place of `rhel-coreos`.
This enhancement describes improvements to the way RHEL CoreOS (RHCOS) is built so that it will better align with image mode for RHEL, all while also providing benefits on the OpenShift side. Currently, RHCOS is built as a single layer that includes both RHEL and OCP content. This enhancement proposes splitting it into three layers. Going from bottom to top: 1. the (RHEL-versioned) bootc layer (i.e. the base rhel-bootc image shared with image mode for RHEL) 2. the (RHEL-versioned) CoreOS layer (i.e. coreos-installer, ignition, afterburn, scripts, etc...) 3. the (OCP-versioned) node layer (i.e. kubelet, cri-o, etc...) The terms "bootc layer", "CoreOS layer", and "node layer" will be used throughout this enhancement to refer to these. The details of this enhancement focus on doing the first split: creating the node layer as distinct from the CoreOS layer (which will not yet be rebased on top of a bootc layer). The two changes involved which most affect OCP are: 1. bootimages will no longer contain OCP components (e.g. kubelet, cri-o, etc...) 2. the `rhel-coreos` payload image will be built in Prow/Konflux (as any other) Tracked at: https://issues.redhat.com/browse/OCPSTRAT-1190
0f934fb
to
13ae158
Compare
Updated for comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jlebon: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, promote this image as `node`. In the future, when we're ready to switch CI over to the node image, it'll take the place of `rhel-coreos`.
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, promote this image as `node`. In the future, when we're ready to switch CI over to the node image, it'll take the place of `rhel-coreos`.
Part of https://issues.redhat.com/browse/COS-3086. Hopefully this is the last time we have to do this! (This should become obsoleted by openshift/enhancements#1637).
Part of https://issues.redhat.com/browse/COS-3086. Hopefully this is the last time we have to do this! (This should become obsoleted by openshift/enhancements#1637).
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, promote this image as `node`. In the future, when we're ready to switch CI over to the node image, it'll take the place of `rhel-coreos`.
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`.
* openshift/os: start building node image As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`. * openshift/os: add an e2e-aws test Now that we're building the node image in CI, we can run cluster tests with it. Let's start simple for now with just the standard e2e-aws test. Note that it doesn't run by default. This means that we can request it on specific PRs only using `/test`.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
* openshift/os: start building node image As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`. * openshift/os: add an e2e-aws test Now that we're building the node image in CI, we can run cluster tests with it. Let's start simple for now with just the standard e2e-aws test. Note that it doesn't run by default. This means that we can request it on specific PRs only using `/test`.
This enhancement describes improvements to the way RHEL CoreOS (RHCOS) is built so that it will better align with image mode for RHEL, all while also providing benefits on the OpenShift side. Currently, RHCOS is built as a single layer that includes both RHEL and OCP content. This enhancement proposes splitting it into three layers. Going from bottom to top:
The terms "bootc layer", "CoreOS layer", and "node layer" will be used throughout this enhancement to refer to these.
The details of this enhancement focus on doing the first split: creating the node layer as distinct from the CoreOS layer (which will not yet be rebased on top of a bootc layer). The two changes involved which most affect OCP are:
rhel-coreos
payload image will be built in Prow/Konflux (as any other)Tracked at: https://issues.redhat.com/browse/OCPSTRAT-1190