Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootimages: Downloading and updating bootimages via release image #201
bootimages: Downloading and updating bootimages via release image #201
Changes from all commits
5505d7d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have some details about what this means?
I've read through what I can find and I don't understand how machine-os-content is produced, there's no source location in the release image.
As a user, how would I consume this? How would I find out which is the image URL for say the OpenStack image? Does the meta.json become an annotation on the image or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this exists yet - it's a proposal.
But this is the same question as #201 (comment) right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Via https://github.com/coreos/coreos-assembler and https://gitlab.cee.redhat.com/coreos/redhat-coreos/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it thank you - I was looking for details about how this should be implemented to see if we could already do part of it.
The problem for us is that baremetal (and openstack) IPI both need to know where the RHCOS images live. Today we do awful hacks with
openstack-install version
and GitHub to fetch the data that don't work in CI. There,openshift-install version
reports a sha that doesn't exist due to the rebasing strategy CI uses. We ended up with a temporary workaround and just copy rhcos.json into the installer container.I'd be interested to see if we could get a better approach to make the data accessible to end users (this
--info
flag @abhinavdahiya mentioned)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include details on a option that prints the contents of bootimage.json like mentioned here https://github.com/openshift/enhancements/pull/201/files#diff-9b570624a774881c701bbfbc18b60109R64
a way to get the prebuilt bootimages if the user sees fit as an alternative to generating the whole thing...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets into a fundamental question of whether we do ship pre-built bootimages going forward in general. Clearly in "phase 1" we continue to do so as we can't break the way openshift-install works today.
The thing is though in any disconnected environment it'd require another step for admins to mirror on an ongoing basis.
On one hand though, we absolutely are going to need to make it ergonomic at some point to "boot a single RHCOS node" for testing. There are many valid reasons for this, among them hardware validation before trying a cluster install, experimenting with configuration "live" that one then compiled into Ignition/MachineConfig, etc. And until such time as we have
openshift-install run-one-coreos-instance
or something...people are going to want to download the bootimages directly and boot them too.So...I'm conflicted.
Maybe what happens in the end is we do do both - but the cadence of "golden bootimages" would still be once per release (for the bootstrap). There's no point to us uploading e.g. new AMIs if in practice clusters are going to use in-place update mechanisms anyways.
IOW...I think people doing UPI installs would turn to automating
oc adm release generate-bootimage
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For phase 1, i think it's necessary for migration.
we could add a
--info
flag with it defaulting to true to begin with. and when we have actual generation capability we move to--info
false so users are generating the images.i think
--info
will continue to provide value in future, because it will allow the customers to get information likewhat's the rhcos version, maybe what's included, if there are any publicly available images they are try let's use them. and as long as the output is versioned users have a way to differentiate and automate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm thinking about this again. I guess one thing here is what you most strongly objected to was having
openshift-install
be the frontend for this information (clearly we want to do all the other stuff here though too).But clearly a "MVP" for this is basically what you're saying with
--info
. The question is how to get there from the today where RHCOS is pinned in the installer. I really don't want two sources of this information (it's already bad enough docs are updated manually). So...what if we still added a hidden/secret mechanism foroc adm release new
to downloadopenshift-install
and extract this data from it somehow?The simplest would perhaps be a hidden version of
openshift-install hidden-only-for-oc-output-coreos-build
or so.Or, we could stick it in a special ELF section (would require
oc adm
to vendor some ELF tools).Or, we could do a variant of openshift/installer#1422 and basically scrape the binary; the lowest tech approach...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include an object into the release image like we include the machine-os-content.
This object can then be used to provide the information as part of
oc adm release info --boot-images
trying to get this data from installer is not correct imo. the installer's embedded data is mostly for bootstrap-host and it currently being used for cluster hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are all agreed on the long term goal of getting the data out of the installer. The problem is in the short term, having two sources of truth for this data is going to be very problematic.
Now...hmm, we could get the data into a container in the release payload and change
oc adm release new
to patch theopenshift-install
binary, the same way it does for the release image itself?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of doing this is that anyone building
openshift-install
from git...well, it just won't work. I guess we could change the installer to fetch "the latest" from a URL if built from git, basically the same way we do with the release image.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For metal IPI CI, we stuff the data into the baremetal-installer container because we have to have it. See: openshift/installer#3330
The way users get this data to use metal IPI is really painful (
openshift-install version
, and fetching the rhcos.json from GitHub for that sha!). Basically every on-premise platform needs access to the image locations...I dislike client-side patching of the installer binary. Storing it somehow next to machine-os-content makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not talking about any client side patching. This is something that would happen in the OpenShift build system (release controller).
Yes but the problem with that is that it's the bootstrap node that pulls the release image, so that leaves open the problem of which image to use for the bootstrap.
Now I guess a first transitional step could be changing the installer to use the metadata from the release payload for the cluster, and leave the pinned version only for the bootstrap node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oc adm release extract
patches the binary. Sure, the binaries published on mirror.openshift.com already had that run to get them, but users can do that themselves too client side.It's the default for baremetal since we don't publish ours.
A user can just use
oc adm release info
? That's at least a consistent place for any release-related information.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this at all? If we're already pulling all the binaries to build the image, why not just publish an image that contains a ISO or whatever format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think the basic answer is that if we e.g. ship a container image wrapping each pre-built bootimage, that's at least 700MB per image and there are like 7-8 or more images. That's...a lot. So things like
oc adm release mirror
would have to learn how to filter them. It'd make the OpenShift release process to push them much slower.It just seems more tenable to me to make this dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit more on why? Just in terms of avoiding needing a new nontrivial container image? Needing to ship code that was formerly "behind the firewall" to run on premise? Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the bit level, we'll know the image they are booting is the one we built, we don't have to worry about a bug in the builder producing something incorrect in their environment. It's also one less moving piece we need to worry about in-cluster. My preference is to add nothing to the cluster that doesn't need to 100% need to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we moving away from public AMIs?
There was some talk about possibly changing machine-controller to an image reference rather than updating each machineset. Either of these cases has the question of mixed image support. Do we plan on doing that in the future, or do we do it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openshift/installer#2906 suggests moving away from AMIs, yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal seems to me an extension of the current upgrade workflow driven by the Machine Config Operator.
If I understand correctly this is introducing a new upgrading step to run the
machine-os-bootimage-generator
to use the just broughtmachine-os-content
to build an upload an artifact during an upgrade. So this belong together with all the upgrading business logic to the Machine Config Operator. And only when this last step is complete the MCO should signal a cluster upgrade as completed. So then any component in any environment e.g UPI have an available artifact to consume.I can't see why we'd want to couple this with the machine API or the machine API operator.
I'd find it more reasonable for the MCO to make the new artifact ID available via e.g configMap as part of this new upgrading step and let any consumer e.g machineSets to reference it as they see fit e.g
spec.Ami: "ocp-cluster-latest"
.Watching and updating all the machineSets massively would be again effectively an additional step for an upgrade workflow to be considered complete. And I don't think we should split or couple our upgrading process to something orthogonal like the machine API, this belong to the Machine Config Operator to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's that clear cut. Changing the bootimage has no effect on any running system; it's not clearly coupled with the main CVO upgrade logic.
Because there's nothing to do if the machineAPI isn't in use.
Again none of this applies unless machineAPI is in use. Sure, the code could technically live in the MCO but...
Another argument for having it live in machineAPI is that it requires IaaS credentials, usage of those APIs - the MCO isn't set up with that today, but machineAPI is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to expand on this - in a non-machineAPI (i.e. UPI) scenario, it'd be up to the customer to integrate this tooling into whatever they use, such as updating their AWS CloudFormation template, etc. Similar for the "manual bare metal PXE" scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to note, which is called out in this proposal - this is not replacing the default in-place updates. So even in a machineAPI/IPI scenario, if we edit the machinesets as part of the default CVO flow (whether the MCO edits or machineAPI edits), we would rely on the default machineAPI semantic of not reprovisoning machines when a machineset is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see only a tangential point here to the machine API -> new machines belonging to a machineSet should come with the latest OS dictated by the MCO. The lifecycle of that OS artifact is transparent for the machine API. The machine API should be a consumer of this, just like any other component/tooling creating a new instance (e.g cloud formation) would be. They'd only need to know about where this is available.
The machine API has actually no particular knowledge about IaaS credentials. It just happen to consume what ever the cloud credentials operator makes available for any component in the cluster.
I think It's up to the consumer to integrate as they see fit but to consume a consistently formatted "artifact address" given by the component which is the authoritative source of truth for cluster upgrades i.e the MCO. This consumer could be anything: machine API, cloud formation, etc. They are orthogonal building blocks.
I'm not in favour of the MCO editing machineSets to complement a cluster upgrade. That would be effectively scattering our upgrade process into different components and coupling our upgrade process to the machine API.
I think this proposal describes a missing step in our current upgrade workflow driven by the authoritative source of truth for cluster upgrades i.e MCO. Once this would be supported then any component/tooling (e.g machine API, anything) could transparently react as they see fit to consume a consistently formatted source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposal is to
===
I think the separation make sense because machine config + rhcos knows best how to create bootimages.. and machine-api ecosystem knows best how to talk to cloud to create a cloud specific images and interact with Machinesets.