Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial multi-stream enhancement #1692

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sdodson
Copy link
Member

@sdodson sdodson commented Oct 4, 2024

Copied from https://hackmd.io/q1Txm0twTYatSvSf1vNFeA?view where there's some comments which warrant understanding.

Copied from https://hackmd.io/q1Txm0twTYatSvSf1vNFeA?view where
there's some comments which warrant understanding.
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2024
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane to me overall

Comment on lines +177 to +179
At install time the **cluster creator** will either specify the desired os
for `ControlPlane` and `Compute` or not, if they provide no value the installer
is to render the current default stream into relevant resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there strong value in exposing this via install config versus just supporting it as "day 0" machineconfig?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't have to be, for all platforms we already allow overriding the boot image and you could patch MCP manifests as well.

Whenever compute resources aren't elastic should we support a special mode where the host
OS is reinstalled across versions specifically NOT preserving any data / config?

#### Single-node Deployments or MicroShift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today MicroShift is pretty different in that the user chooses the base OS version already.

When the installer is built for OCP valid `osStream` must start with "rhcos"
and match the name of a file in data/data/coreos/

Somewhere in MCO's templates/ add streams/{rhcos-9,rhcos-10} anything outside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I suspect we may need to add to the MCO is a conditional like this for user provided machineconfig.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hold that stream is configured at MCP level and that value is effectively immutable, wouldn't they just provide MachineConfig that matches the labels for each pool?

Need to vet the viability of stream being immutable with how we would potentially handle in-place major OS upgrades. My thinking for now is that entails moving a node from one pool to another rather than reconfiguring a pool for a different stream.

Comment on lines +170 to +171
recorded in /etc/os-release or other available facilities. There's probably
some systemd magic here or something. This probably also pushes more static
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we I think we need to make it easy dispatch on the OS major/minor; this gets into a lot of details around our use of VERSION_ID and how that looks...and the fact that today our versions include the OCP version and the OS version...

@cgwalters
Copy link
Member

cgwalters commented Oct 8, 2024

There's intersections with OLM here right? Does it already exist as a way for operators to declare their compatible OS versions?

Copy link
Contributor

openshift-ci bot commented Oct 9, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sdodson. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sdodson
Copy link
Member Author

sdodson commented Oct 9, 2024

There's intersections with OLM here right? Does it already exist as a way for operators to declare their compatible OS versions?

@cgwalters dd32891 adds my opinion but I've put it as an open question because I think it warrants additional discussion.

I had previously asked if we had a way to determine which operators requested RBAC necessary to run privileged containers and we may or may not be able to assess that. I'm hoping to hear back on that soon so we can perform some more targeted analysis. However I assume that those who truly become part of the OS will have motivation not to ignore RHEL10, such as GPU management operators who embed drivers. They may skimp on the openshift operator side but it seems like they won't forego RHEL10 drivers all together and hopefully the additional operator work is minimal.

* As an OpenShift admin adding newer hardware to an existing cluster I want the
new hardware to boot, run, and update from a specific OS stream.

* As an OpenShift admin wishing to migrate existing hosts to a newer stream I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We debated this live, but I want to write things down here for consistency. I have no opposition to making it obvious/easy for admins to use CAPI to spawn separate RHEL-$next hosts for testing, etc.

But I am pretty confident in stating that we can support seamless in-place upgrades from 9 to 10 for the majority of e.g. cloud-deployed clusters. Take our own Prow clusters for example. I bet we can just flip the flag to run those on rhel10 and watch it roll out in place by default and it would Just Work.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2024
@sdodson
Copy link
Member Author

sdodson commented Nov 8, 2024

/remove-lifecycle stale
Needs freshening

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants