Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes refactoring #450

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

cbosdo
Copy link
Contributor

@cbosdo cbosdo commented Sep 10, 2024

What does this PR change?

The installation, migration, upgrade and uninstall for Kubernetes has been refactored to:

  • no longer use a helm chart
  • use jobs for the migration phases
  • merge all the installation, migration and upgrade code into a single Reconcile() function.

This PR paves the way for the transition to an operator and improves the reliability of migrations.

As a collateral, SSH agent can no longer be used as that would require
running on a cluster node again. The tool creates a ConfigMap to store the SSH config and known_hosts and a Secret
for a password-less SSH key.

Using Kubernetes API modules also helps for code reuse with a future operator.

Test coverage

  • Unit tests were added

  • DONE

Links

Issue(s): #431 #432 #433 #436

  • DONE

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

  • No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Before you merge

Check How to branch and merge properly!

@cbosdo cbosdo marked this pull request as draft September 10, 2024 17:36
@cbosdo cbosdo force-pushed the k8s-refactoring branch 7 times, most recently from 997fe46 to 8f9ea9e Compare September 16, 2024 21:01
@cbosdo cbosdo force-pushed the k8s-refactoring branch 3 times, most recently from e95556e to 837af3c Compare September 23, 2024 13:56
@cbosdo cbosdo force-pushed the k8s-refactoring branch 7 times, most recently from e5c4acd to d0bcbeb Compare October 6, 2024 19:34
@cbosdo cbosdo force-pushed the k8s-refactoring branch 3 times, most recently from dcbc987 to ffc6888 Compare October 25, 2024 07:08
@cbosdo cbosdo force-pushed the k8s-refactoring branch 2 times, most recently from d0ca4d4 to b20b6c7 Compare November 6, 2024 08:42
@cbosdo cbosdo force-pushed the k8s-refactoring branch 3 times, most recently from debd9af to c406d19 Compare November 13, 2024 07:39
nadvornik and others added 2 commits November 13, 2024 10:17
During a migration to kubernetes the server is deployed after the rsync
to prepare the SSL secrets and PVC. This has the nasty effect to
corrupt the synchronized data with a too recent catalog version ID.
This would let the DB migration to fail starting the old postgresql
server.

To workaround this, move the data to the the backup place after the
rsync instead of the begining of the db upgrade.
Some pods require a long time to run. This is the case for the DB
upgrade finalization that runs a potentially long reindex.
Migration deploys the helm chart multiple times. Identifying which
revision corresponds to which step in helm history is hard without a
description.
Of of the issuers creation function had two distinct behaviors and this
was only generating confusion when reading the whole code. This function
has been split and some useless intermediary functions have been merged.

This with better function naming should make the SSL setup code more
understandable.
In the kubernetes world we need to link the ports to services. For now
we only have a TCP and an UDP service for server and the same for proxy,
but in the short term, we will need more services to allow splitting
into multiple pods.

This refactoring is preparing this split.
Running commands in a running container only works if there is a running
container and is harder to unit test.

In order to help sharing code for Kubernetes, the SanityCheck now gets
the existing deployment version with inspecting its image. This also
helps adding unit tests for those checks.
In order to later share code between those 3 very similar commands, we
need to share the parameters data structure.
Migration to kubernetes is rather fragile, with:
    1. tasks running in `kubectl exec` or as `pod`.
    2. the uyuni helm chart being deployed multiple times
    3. `hostPath` mounts are used everywhere for the scripts to run and
       data to read and force the script to run on the cluster node.

Here are the solutions to those problems:

1. Each step will run as a Job and those won't be deleted automatically
   for the user to access their logs after.

2. Stop using the helm chart and deploy the resources when we need them.
   This will allow more control of what runs when and reduces the number
   of useless starts of the giant container.

   Postgresql DB upgrade will disable SSL temporarily in the
   postgresql.conf in order to not rely on the SSL certificates to be
   migrated.

3. The scripts to run for each step will be passed directly as `sh -c`
   parameter to the generated Jobs.
   The migration data are be stored in a special volume and not on the
   host.

As a collateral, SSH agent can no longer be used as that would require
running on a cluster node again. The tool now creates a ConfigMap to
store the SSH config and known_hosts and a Secret for a passwordless
SSH key.

The PersistentVolumes are not destroyed after the end of the first job
and are then reused by the next ones and the final deployment.

Using Kubernetes API modules also helps for code reuse with a future
operator.

Note that the old postgresql database cannot be moved to a separate
PersistentVolumes. As we run a `db_upgrade --link`, the old database is
linked by the new one and cannot be disposed of.
In order to share the same code for installation, migration and upgrade
the RunSetup() function needs to move to the mgradm shared utils module.
Remove all server resources without relying on the helm chart.
Refactor upgrade and install of the server to no longer need the helm
chart as initiated for the migration, but merge all those logics into a
single Reconcile() function to avoid redundancy.

Merging the code into a single function will also help figuring out how
to implement an operator in the future.
There is no need to run a potentially lengthy reindexing on minor
upgrades, only on major ones.

Don't call su with `-` parameter as it shows the warning message for
terminals... and that looks ugly in logs.
Since helm is no longer used installing Uyuni, but only cert-manager,
rename the flags. Also drop those that are no longer used for the
server after the refactoring.
With CGO enabled there are include problems on that architecture and
that would probably require cross-compiling.
Traefik helm chart changed the structure of the expose property starting
version 27. Read the chart version from the trafik.yaml file and write
the config accordingly.
Running the first user creation from outside the container relies on the
pod to be seen as ready by kubernetes... and sometimes it takes longer
than others. Calling the API from the setup script inside the container
allows to use localhost and not rely on ingress to route the request.
During the installation, there was a message indicating that the
timezone from the host couldn't be set in the container. This was due to
no removing the line end from the command output.
Copy link
Contributor

@aaannz aaannz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just couple notes, probably for future.
I think we should merge this ASAP even it there are things to change, reviewing this is couple hours job.

@@ -9,8 +9,6 @@ import (

"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
adm_utils "github.com/uyuni-project/uyuni-tools/mgradm/shared/utils"
"github.com/uyuni-project/uyuni-tools/shared"
. "github.com/uyuni-project/uyuni-tools/shared/l10n"
shared_podman "github.com/uyuni-project/uyuni-tools/shared/podman"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shared_podman "github.com/uyuni-project/uyuni-tools/shared/podman"
"github.com/uyuni-project/uyuni-tools/shared/podman"

A nitpick, probably due to various refactoring we keep this, but it is no longer needed, there is no namespace clash anymore.
Ofc the rest of the code will need to drop shared_ prefix.

This will also make it the same between inspect and install.

Comment on lines +15 to 20
func installForKubernetes(
globalFlags *types.GlobalFlags,
flags *kubernetes.KubernetesServerFlags,
cmd *cobra.Command,
args []string,
) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a clarity of code, as this is now just couple lines, can they be moved from utils.go directly to kubernetes.go and drop utils.go?

Seems strange to me to have utils file with single 2 lines command.

Comment on lines +35 to +39
out, err := kubernetes.RunPodLogs(namespace, "uyuni-data-extractor", image,
pullPolicy, pullSecret, []types.VolumeMount{volume},
"sh", "-c",
"for f in /var/lib/uyuni-tools/*; do echo \"`basename $f`: |2\"; cat $f | sed 's/^/ /'; done",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate some comment what is this supposed to do?

Comment on lines +52 to +53
const sshSecretName = "uyuni-migration-key"
const sshConfigName = "uyuni-migration-ssh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the constants before functions.

Comment on lines +70 to +71
func DeployReusedCa(namespace string, ca *types.SSLPair) ([]string, error) {
helmArgs := []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func DeployReusedCa(namespace string, ca *types.SSLPair) ([]string, error) {
helmArgs := []string{}
func DeployReusedCa(namespace string, ca *types.SSLPair) error {

@@ -90,33 +94,82 @@ func installSSLIssuers(helmFlags *cmd_utils.HelmFlags, sslFlags *cmd_utils.Insta
log.Fatal().Err(err).Msg(L("Failed to create issuer"))
}

// Wait for issuer to be ready
return helmArgs, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return helmArgs, nil
return nil

}

// Install the cert-manager issuers
if _, err := DeployReusedCa(namespace, &ca); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _, err := DeployReusedCa(namespace, &ca); err != nil {
if err := DeployReusedCa(namespace, &ca); err != nil {

Comment on lines +116 to +133

var noSSLPaths = []string{
"/pub",
"/rhn/([^/])+/DownloadFile",
"/(rhn/)?rpc/api",
"/rhn/errors",
"/rhn/ty/TinyUrl",
"/rhn/websocket",
"/rhn/metrics",
"/cobbler_api",
"/cblr",
"/httpboot",
"/images",
"/cobbler",
"/os-images",
"/tftp",
"/docs",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need some crosslink with relevant parts of uyuni code so we know when remove/add here where else it should be checked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const uyuniCaIssuer = `apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: uyuni-ca-issuer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this reuse constant from kubernetes.CaIssuerName?

Comment on lines +294 to +299
%ifarch i386
%if 0%{?debian}
# Disable CGO build for debian 32 bits to avoid cross-compilation
export CGO_ENABLED=0
%endif
%endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about 32bit Debian systems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants