Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHELAI-429: Adding upgrade informer service #663

Closed
wants to merge 2 commits into from

Conversation

tsorya
Copy link
Contributor

@tsorya tsorya commented Jul 9, 2024

Upgrade informer will run every couple of our and will be triggered by systemd timer.
In order to start it on boot and run once i enabled it and timer. Disabling auto upgrade service in order to remove unexpected reboots. Service will run "bootc upgrade --check" and in case new version exists it will create motd file with upgrade info.
Removed unused grow-part services

@tsorya tsorya force-pushed the igal/upgrade branch 2 times, most recently from 8286896 to 63f1706 Compare July 9, 2024 12:33
message_file="/etc/motd.d/upgrade-message"
bootc_auth="/etc/ostree/auth.json"

if [[ $output == Update\ available* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to get proper support into bootc so we dont have to parse strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, didn't find a good way to do it rather this one. I assume i can look on bootc and add , --json maybe?

Copy link
Contributor Author

@tsorya tsorya Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good way to do it, bootc upgrade --check doesn't have an option for json format.
Though there is draft pr for it containers/bootc#472 (comment).
Hope we will be able to change it in future versions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we automatically download the image, and not reboot?

Copy link
Contributor Author

@tsorya tsorya Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can run "bootc upgrade" without applying the change but it means that next reboot user will get new version in anycase, no?
We talked about updating user that there is an available upgrade but not applying anything by ourselves for now at least

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyeder WDYT?

Copy link
Contributor

@jeremyeder jeremyeder Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

showing folks and update is available is all we set out to do here; over time we can gauge what level of effort to put on this area. we want to nudge users to upgrade for all the normal reasons, but going beyond that feels premature.

# Run the command and capture its output
output=$(bootc upgrade --check | sed -e 1q)
message_file="/etc/motd.d/upgrade-message"
bootc_auth="/etc/ostree/auth.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this file get there? what happens if it doesnt exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we talked that user will need to add it.
We can run command like :
podman login registry.redhat.io --username user --password pass --verbose | grep Used | awk '{print $2}' | xargs -I{} cp {} /etc/ostree/auth.json
There is a talk in bootc to add some common auth file between bootc and podman but it is still talks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade-informer would run before the user has a chance to authenticate unless you're collecting it in firstboot (which we talked about doing?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create doc how to add creds on boot but i am not sure how we can make customers to do it.
I can fail nice in case we can't reach registry and exit 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add systemd service that can collect username/password as env variables and connect system to registry + insights on boot? Will it be more helpful?
I can block this service if /etc/ostree/auth.json was not set? Not sure what is better, not to run or try and fail gracefully in case no auth needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happened with current code
Jul 15 13:23:50 localhost upgrade-informer[1645]: ERROR Upgrading: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: reading manifest v3.15 in quay.io/itsoiref/nvidia-bootc-private: unauthorized: access to the requested re> Jul 15 13:23:50 localhost upgrade-informer[1643]: No upgrade was found
It will not fail the service and just log the error
@jeremyeder i think it should be safe enough for now, WDYT?

@tsorya tsorya force-pushed the igal/upgrade branch 2 times, most recently from d5eecda to 859d019 Compare July 10, 2024 16:37
@rhatdan
Copy link
Member

rhatdan commented Jul 11, 2024

My understanding of the growpart was if the RHEL-AI was installed onto a cloud machine. Is this something we do not intend to do?

@tsorya
Copy link
Contributor Author

tsorya commented Jul 11, 2024

My understanding of the growpart was if the RHEL-AI was installed onto a cloud machine. Is this something we do not intend to do?

@rhatdan it was removed from all the images so i assumed it was not needed anymore but wanted to use same process of adding new service. If it is required i can leave it


# Get image version
# shellcheck disable=SC2086
image_version=$(skopeo inspect --format json $auth_params "$bootc_image" | jq '.Labels | .["image_version"] // empty' | tr -d '"')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not crazy about requiring skopeo, I believe this package could disappear and be replaced soley with Podman in the future. Can't we just save the output of bootc upgrade --check?

@cgwalters PTAL

Copy link
Contributor Author

@tsorya tsorya Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can though i wanted to get version from the image and bootc doesn't provide an option to get labels.
I assume we can change this part whenever we want. I think when we will remove skopeo this small service will not be there already and will be changed to something much mature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not must for sure and we can remove this part, we can update that there is new upgrade without saying which version, at least for now

@tsorya tsorya force-pushed the igal/upgrade branch 2 times, most recently from 7a256e1 to db0ad28 Compare July 15, 2024 11:11
image_version=$(skopeo inspect --format json $auth_params "$bootc_image" | jq '.Labels | .["image_version"] // empty' | tr -d '"')

# If upgrade available, write the output to the file
echo -e "\n\n ** Attention! ** \n** A new $image_version version is available **\n\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyeder can you help me with good text here?

@tsorya tsorya force-pushed the igal/upgrade branch 2 times, most recently from 1a373ff to c5c061b Compare July 15, 2024 15:59
omertuc and others added 2 commits July 18, 2024 14:54
Ticket [RHELAI-442](https://issues.redhat.com/browse/RHELAI-442)

# Background

RHEL AI ships with a script in `/usr/local/bin` called `ilab` which
makes running `ilab` commands feel native even though they're actually
running in a podman container

# Issues

* The script is outdated / used several different container images for
different purposes, while it should be just using the single instructlab
image

* The volume mounts were incorrect, as instructlab now uses XDG paths

* Unnecessary directory creation for `HF_CACHE`

* Unnecessary GPU count logic

* Script has unnecessary fiddling of `ilab` parameters, essentially creating a
  UX that deviates from the natural `ilab` CLI

# Solutions

* Changed script to use the single container image `IMAGE_NAME` (this
  was already the case mostly, except for old references to `LVLM_NAME`
  and `TRAIN_NAME` which no longer get replaced leading to a broken `PODMAN_COMMAND_SERVE`.
  Also adjusted entrypoint to use the `ilab` executable in the pyenv

* Will now mount the host's `~/.config` and `~/.local` into the
  container's corresponding directories, for `instructlab` to use
  and for its config / data to persist across invocations

* Will now mount `~/.cache` into the container's corresponding `.cache`
  directory, so that the information stored in the default `HF_CACHE` is
  also persisted across invocations

* Removed unnecessary GPU count logic

* Removed all parameter parsing / fiddling

# Other changes

Added secret/fake "shell" `ilab` subcommand which opens a shell in the
wrapper's container, useful for troubleshooting issues with the wrapper
itself

Signed-off-by: Omer Tuchfeld <[email protected]>
Upgrade informer will run every couple of our and will be triggered by
systemd timer.
In order to start it on boot and run once i enabled it and timer.
Disabling auto upgrade service in order to remove unexpected reboots.
Service will run "bootc upgrade --check" and in case new version exists
it will create motd file with upgrade info.
Removed unused grow-part services

Signed-off-by: Igal Tsoiref <[email protected]>
@javipolo
Copy link
Collaborator

This functionality has been splitted and moved into different PRs:

#690
#689

@javipolo javipolo closed this Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants