diff --git a/docs/howto/upgrade-cluster/aws.md b/docs/howto/upgrade-cluster/aws.md index 4f7a4c556c..98f1c16204 100644 --- a/docs/howto/upgrade-cluster/aws.md +++ b/docs/howto/upgrade-cluster/aws.md @@ -73,154 +73,202 @@ cluster is unused or that the maintenance is communicated ahead of time. When upgrading an EKS cluster, we will use `eksctl` extensively and reference a generated config file, `$CLUSTER_NAME.eksctl.yaml`. It's generated from the - the `$CLUTER_NAME.jsonnet` file. - - If you update the .jsonnet file, make sure to re-generate the .yaml file - before using `eksctl`. Respectively if you update the .yaml file directly, - remember to update the .jsonnet file. + the `$CLUSTER_NAME.jsonnet` file. ```bash # re-generate an eksctl config file for use with eksctl jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml ``` + ```{important} + If you update the .jsonnet file, make sure to re-generate the .yaml file + before using `eksctl`. + + Respectively if you update the .yaml file directly, + remember to update the .jsonnet file. + ``` + ## Cluster upgrade -1. *Ensure in-cluster permissions* +### 1. Ensure in-cluster permissions - The k8s api-server won't accept commands from you unless you have configured - a mapping between the AWS user to a k8s user, and `eksctl` needs to make some - commands behind the scenes. +The k8s api-server won't accept commands from you unless you have configured +a mapping between the AWS user to a k8s user, and `eksctl` needs to make some +commands behind the scenes. - This mapping is done from a ConfigMap in kube-system called `aws-auth`, and - we can use an `eksctl` command to influence it. +This mapping is done from a ConfigMap in kube-system called `aws-auth`, and +we can use an `eksctl` command to influence it. - ```bash - eksctl create iamidentitymapping \ - --cluster=$CLUSTER_NAME \ - --region=$CLUSTER_REGION \ - --arn=arn:aws:iam:::user/ \ - --username= \ - --group=system:masters - ``` +```bash +eksctl create iamidentitymapping \ + --cluster=$CLUSTER_NAME \ + --region=$CLUSTER_REGION \ + --arn=arn:aws:iam:::user/ \ + --username= \ + --group=system:masters +``` -2. *Acquire and configure AWS credentials* +### 2. Acquire and configure AWS credentials - Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials. +Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials. - In case the AWS account isn't managed there, inspect - `config/$CLUSTER_NAME/cluster.yaml` to understand what AWS account number to - login to at https://console.aws.amazon.com/. +In case the AWS account isn't managed there, inspect +`config/$CLUSTER_NAME/cluster.yaml` to understand what AWS account number to +login to at https://console.aws.amazon.com/. - Configure credentials like: +Configure credentials like: - ```bash - export AWS_ACCESS_KEY_ID="..." - export AWS_SECRET_ACCESS_KEY="..." - ``` +```bash +export AWS_ACCESS_KEY_ID="..." +export AWS_SECRET_ACCESS_KEY="..." +``` -3. *Upgrade the k8s control plane's one minor version* +### 3. Upgrade the k8s control plane one minor version - The k8s control plane can only be upgraded one minor version at the time.[^1] - So, update the eksctl config's version field one minor version. +```{important} +The k8s control plane can only be upgraded one minor version at the time.[^1] +``` - Then, perform the upgrade which typically takes ~10 minutes. +#### 3.1. Update the cluster's version field one minor version. - ```bash - eksctl upgrade cluster --config-file=$CLUSTER_NAME.eksctl.yaml --approve - ``` +In the cluster's config file there should be an entry like the one below, +where the version must be updated. - ```{note} - If you see the error `Error: the server has asked for the client to provide credentials` don't worry, if you try it again you will find that the cluster is now upgraded. - ``` +```yaml +{ + name: "carbonplanhub", + region: clusterRegion, + version: '1.27' +} +``` -4. *Upgrade node groups up to two minor versions above the k8s control plane* +Then, perform the upgrade which typically takes ~10 minutes. - A node's k8s software (`kubelet`) can be up to two minor versions ahead or - behind the control plane version.[^1] Due to this, you can plan your cluster - upgrade to only involve one node group upgrade even if you increment the - control plane four minor versions. +```bash +eksctl upgrade cluster --config-file=$CLUSTER_NAME.eksctl.yaml --approve +``` - So if you upgrade from k8s 1.21 to 1.24, you can for example upgrade the k8s - control plane from 1.21 to 1.22, then upgrade the node groups from 1.21 to - 1.24, followed by upgrading the control plane two steps in a row. +```{note} +If you see the error `Error: the server has asked for the client to provide credentials` don't worry, if you try it again you will find that the cluster is now upgraded. +``` +#### 3.2. Upgrade EKS add-ons (takes ~3*5s) - To upgrade (unmanaged) node groups, you delete them and then them back. When - adding them back, make sure your cluster config's k8s version is what you - want the node groups to be added back as. +As documented in `eksctl`'s documentation[^1], we also need to upgrade three +EKS add-ons enabled by default, and one we have added manually. - 1. Update the k8s version in the config temporarily +```bash +# upgrade the kube-proxy daemonset +eksctl utils update-kube-proxy --config-file=$CLUSTER_NAME.eksctl.yaml --approve - This is to influence the k8s software version for the nodegroup's we - create only. We can choose something two minor versions of the current k8s - control plane version. +# upgrade the aws-node daemonset +eksctl utils update-aws-node --config-file=$CLUSTER_NAME.eksctl.yaml --approve - 2. Add a new core node group (like `core-b`) +# upgrade the coredns deployment +eksctl utils update-coredns --config-file=$CLUSTER_NAME.eksctl.yaml --approve - Rename (part 1/3) the config file's entry for the core node group - temporarily when running this command, either from `core-a` to `core-b` or - the other way around. +# upgrade the aws-ebs-csi-driver addon's deployment and daemonset +eksctl update addon --config-file=$CLUSTER_NAME.eksctl.yaml +``` - ```bash - eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-b" - ``` +````{note} Common failures +The kube-proxy deamonset's pods may fail to pull the image, to resolve this visit AWS EKS docs on [managing coredns](https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html) to identify the version to use and update the coredns deployment's container image to match it. - 3. Delete all old node groups (like `core-a,nb-*,dask-*`) +```bash +kubectl edit daemonset coredns -n kube-system +``` +```` - Rename (part 2/3) the core node group again in the config to its previous - name, so the old node group can be deleted with the following command. +### 4. Repeat step 3 above for up to three times, if needed - ```bash - eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-a,nb-*,dask-*" --approve --drain=true - ``` +If you upgrade k8s multiple minor versions, consider repeating step 3, +up to maximum three times, incrementing the control plane one minor version +at the time. - Rename (part 3/3) the core node group one final time in the config to its - new name, as that represents the state of the EKS cluster. +This is because the control plane version can be **ahead** of +the node's k8s software (`kubelet`) by up to three minor versions [^2] +if kublet is at least at version 1.25. Due to this, you can plan your +cluster upgrade to only involve the minimum number of node group upgrades. - 4. Re-create all non-core node groups (like `nb-*,dask-*`) +So if you upgrade from k8s 1.25 to 1.28, you can for example upgrade the k8s +control plane three steps in a row, from 1.25 to 1.26, then from 1.26 +to 1.27 and then from 1.27 to 1.28. This way, the node groups were left +behind the control plane by three minor versions, which is ok, because it +doesn't break the three minor versions rule. - ```bash - eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*" - ``` +Then, you can upgrade the node groups directly from 1.25 to 1.28 making only +one upgrade on the node groups instead of three. - 5. Restore the k8s version in the config +### 5. Upgrade node groups version until it matches the k8s control plane - We adjusted the k8s version in the config to influence the desired version - of our created nodegroups. Let's restore it to what the k8s control plane - currently have. +```{important} +Per step 4 above, you can upgrade the version of the node groups maximum +three versions at once, for example from 1.25 to 1.28 directly if the +control plane's version allows it. -5. *Upgrad EKS add-ons (takes ~3*5s)* +If after one such upgrade, the node groups version is still behind +the k8s control plane, you will need to repeat the node upgrade process +until it does. +``` - As documented in `eksctl`'s documentation[^1], we also need to upgrade three - EKS add-ons enabled by default, and one we have added manually. +To upgrade (unmanaged) node groups, you delete them and then add them back in. When +adding them back, make sure your cluster config's k8s version is what you +want the node groups to be added back as. - ```bash - # upgrade the kube-proxy daemonset - eksctl utils update-kube-proxy --config-file=$CLUSTER_NAME.eksctl.yaml --approve +#### 5.1. Double-check current k8s version in the config - # upgrade the aws-node daemonset - eksctl utils update-aws-node --config-file=$CLUSTER_NAME.eksctl.yaml --approve +Up until this step, you should have updated the control plane's +version at least once but for maximum of three times. So you shouldn't +need to update it. - # upgrade the coredns deployment - eksctl utils update-coredns --config-file=$CLUSTER_NAME.eksctl.yaml --approve +However, it is worth double checking that the k8s version that is in +the config file is: +- not ahead of the current k8s control plane version, as this will + influence the version of the node groups. +- not **more than three minor versions** than what the version of node groups + was initially - # upgrade the aws-ebs-csi-driver addon's deployment and daemonset - eksctl update addon --config-file=$CLUSTER_NAME.eksctl.yaml - ``` +#### 5.2. Renaming node groups part 1: add a new core node group (like `core-b`) - ````{note} Common failures - The kube-proxy deamonset's pods may fail to pull the image, to resolve this visit AWS EKS docs on [managing coredns](https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html) to identify the version to use and update the coredns deployment's container image to match it. +Rename the config file's entry for the core node group temporarily when running +this command, either from `core-a` to `core-b` or the other way around, +then create the new nodegroup. - ```bash - kubectl edit daemonset coredns -n kube-system - ``` - ```` +```bash +# create a copy of the current nodegroup +eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-b" +``` + +#### 5.3. Renaming node groups part 2: delete all old node groups (like `core-a,nb-*,dask-*`) + +Rename the core node group again in the config to its previous name, +so the old node group can be deleted with the following command, +then delete the original nodegroup. + +```bash +# delete the original nodegroup +eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-a,nb-*,dask-*" --approve --drain=true +``` + +#### 5.4. Renaming node groups part 3: re-create all non-core node groups (like `nb-*,dask-*`) + +Rename the core node group one final time in the config to its +new name, as that represents the state of the EKS cluster. + +```bash +eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*" +``` + +### 6. Repeat steps 3,4,5 if needed + +If you need to upgrade the cluster more than three minor versions, +consider repeating steps 3, 4 and 5 until the desired version is reached. -6. *Repeat steps 3 and 5 if needed* +### 7. Commit the changes to the jsonnet config file - If you upgrade k8s multiple minor versions, repeat step 3 and 5, where you - increment it one minor version at the time. +During this upgrade, the k8s version and possibly the node group name might have +been changed. Make sure you commit this changes after the upgrade is finished. ## References [^1]: `eksctl`'s cluster upgrade documentation: +[^2]: `k8s's supported version skew documentation: \ No newline at end of file diff --git a/eksctl/carbonplan.jsonnet b/eksctl/carbonplan.jsonnet index f2fc8a3cef..17bde1ac8f 100644 --- a/eksctl/carbonplan.jsonnet +++ b/eksctl/carbonplan.jsonnet @@ -64,7 +64,7 @@ local daskNodes = [ metadata+: { name: "carbonplanhub", region: clusterRegion, - version: '1.24' + version: '1.27' }, availabilityZones: masterAzs, iam: {