Skip to content

Commit

Permalink
Merge pull request 2i2c-org#3420 from GeorgianaElena/upgrade-carbonplan
Browse files Browse the repository at this point in the history
Upgrade carbonplan version and update aws upgrade docs
  • Loading branch information
GeorgianaElena authored Nov 15, 2023
2 parents 2d2469b + e4e857b commit 87ff189
Show file tree
Hide file tree
Showing 2 changed files with 148 additions and 100 deletions.
246 changes: 147 additions & 99 deletions docs/howto/upgrade-cluster/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,154 +73,202 @@ cluster is unused or that the maintenance is communicated ahead of time.

When upgrading an EKS cluster, we will use `eksctl` extensively and reference
a generated config file, `$CLUSTER_NAME.eksctl.yaml`. It's generated from the
the `$CLUTER_NAME.jsonnet` file.

If you update the .jsonnet file, make sure to re-generate the .yaml file
before using `eksctl`. Respectively if you update the .yaml file directly,
remember to update the .jsonnet file.
the `$CLUSTER_NAME.jsonnet` file.

```bash
# re-generate an eksctl config file for use with eksctl
jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
```

```{important}
If you update the .jsonnet file, make sure to re-generate the .yaml file
before using `eksctl`.
Respectively if you update the .yaml file directly,
remember to update the .jsonnet file.
```

## Cluster upgrade

1. *Ensure in-cluster permissions*
### 1. Ensure in-cluster permissions

The k8s api-server won't accept commands from you unless you have configured
a mapping between the AWS user to a k8s user, and `eksctl` needs to make some
commands behind the scenes.
The k8s api-server won't accept commands from you unless you have configured
a mapping between the AWS user to a k8s user, and `eksctl` needs to make some
commands behind the scenes.

This mapping is done from a ConfigMap in kube-system called `aws-auth`, and
we can use an `eksctl` command to influence it.
This mapping is done from a ConfigMap in kube-system called `aws-auth`, and
we can use an `eksctl` command to influence it.

```bash
eksctl create iamidentitymapping \
--cluster=$CLUSTER_NAME \
--region=$CLUSTER_REGION \
--arn=arn:aws:iam::<aws-account-id>:user/<iam-user-name> \
--username=<iam-user-name> \
--group=system:masters
```
```bash
eksctl create iamidentitymapping \
--cluster=$CLUSTER_NAME \
--region=$CLUSTER_REGION \
--arn=arn:aws:iam::<aws-account-id>:user/<iam-user-name> \
--username=<iam-user-name> \
--group=system:masters
```

2. *Acquire and configure AWS credentials*
### 2. Acquire and configure AWS credentials

Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials.
Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials.

In case the AWS account isn't managed there, inspect
`config/$CLUSTER_NAME/cluster.yaml` to understand what AWS account number to
login to at https://console.aws.amazon.com/.
In case the AWS account isn't managed there, inspect
`config/$CLUSTER_NAME/cluster.yaml` to understand what AWS account number to
login to at https://console.aws.amazon.com/.

Configure credentials like:
Configure credentials like:

```bash
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
```
```bash
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
```

3. *Upgrade the k8s control plane's one minor version*
### 3. Upgrade the k8s control plane one minor version

The k8s control plane can only be upgraded one minor version at the time.[^1]
So, update the eksctl config's version field one minor version.
```{important}
The k8s control plane can only be upgraded one minor version at the time.[^1]
```

Then, perform the upgrade which typically takes ~10 minutes.
#### 3.1. Update the cluster's version field one minor version.

```bash
eksctl upgrade cluster --config-file=$CLUSTER_NAME.eksctl.yaml --approve
```
In the cluster's config file there should be an entry like the one below,
where the version must be updated.

```{note}
If you see the error `Error: the server has asked for the client to provide credentials` don't worry, if you try it again you will find that the cluster is now upgraded.
```
```yaml
{
name: "carbonplanhub",
region: clusterRegion,
version: '1.27'
}
```

4. *Upgrade node groups up to two minor versions above the k8s control plane*
Then, perform the upgrade which typically takes ~10 minutes.

A node's k8s software (`kubelet`) can be up to two minor versions ahead or
behind the control plane version.[^1] Due to this, you can plan your cluster
upgrade to only involve one node group upgrade even if you increment the
control plane four minor versions.
```bash
eksctl upgrade cluster --config-file=$CLUSTER_NAME.eksctl.yaml --approve
```

So if you upgrade from k8s 1.21 to 1.24, you can for example upgrade the k8s
control plane from 1.21 to 1.22, then upgrade the node groups from 1.21 to
1.24, followed by upgrading the control plane two steps in a row.
```{note}
If you see the error `Error: the server has asked for the client to provide credentials` don't worry, if you try it again you will find that the cluster is now upgraded.
```
#### 3.2. Upgrade EKS add-ons (takes ~3*5s)

To upgrade (unmanaged) node groups, you delete them and then them back. When
adding them back, make sure your cluster config's k8s version is what you
want the node groups to be added back as.
As documented in `eksctl`'s documentation[^1], we also need to upgrade three
EKS add-ons enabled by default, and one we have added manually.

1. Update the k8s version in the config temporarily
```bash
# upgrade the kube-proxy daemonset
eksctl utils update-kube-proxy --config-file=$CLUSTER_NAME.eksctl.yaml --approve

This is to influence the k8s software version for the nodegroup's we
create only. We can choose something two minor versions of the current k8s
control plane version.
# upgrade the aws-node daemonset
eksctl utils update-aws-node --config-file=$CLUSTER_NAME.eksctl.yaml --approve

2. Add a new core node group (like `core-b`)
# upgrade the coredns deployment
eksctl utils update-coredns --config-file=$CLUSTER_NAME.eksctl.yaml --approve

Rename (part 1/3) the config file's entry for the core node group
temporarily when running this command, either from `core-a` to `core-b` or
the other way around.
# upgrade the aws-ebs-csi-driver addon's deployment and daemonset
eksctl update addon --config-file=$CLUSTER_NAME.eksctl.yaml
```

```bash
eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-b"
```
````{note} Common failures
The kube-proxy deamonset's pods may fail to pull the image, to resolve this visit AWS EKS docs on [managing coredns](https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html) to identify the version to use and update the coredns deployment's container image to match it.
3. Delete all old node groups (like `core-a,nb-*,dask-*`)
```bash
kubectl edit daemonset coredns -n kube-system
```
````

Rename (part 2/3) the core node group again in the config to its previous
name, so the old node group can be deleted with the following command.
### 4. Repeat step 3 above for up to three times, if needed

```bash
eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-a,nb-*,dask-*" --approve --drain=true
```
If you upgrade k8s multiple minor versions, consider repeating step 3,
up to maximum three times, incrementing the control plane one minor version
at the time.

Rename (part 3/3) the core node group one final time in the config to its
new name, as that represents the state of the EKS cluster.
This is because the control plane version can be **ahead** of
the node's k8s software (`kubelet`) by up to three minor versions [^2]
if kublet is at least at version 1.25. Due to this, you can plan your
cluster upgrade to only involve the minimum number of node group upgrades.

4. Re-create all non-core node groups (like `nb-*,dask-*`)
So if you upgrade from k8s 1.25 to 1.28, you can for example upgrade the k8s
control plane three steps in a row, from 1.25 to 1.26, then from 1.26
to 1.27 and then from 1.27 to 1.28. This way, the node groups were left
behind the control plane by three minor versions, which is ok, because it
doesn't break the three minor versions rule.

```bash
eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*"
```
Then, you can upgrade the node groups directly from 1.25 to 1.28 making only
one upgrade on the node groups instead of three.

5. Restore the k8s version in the config
### 5. Upgrade node groups version until it matches the k8s control plane

We adjusted the k8s version in the config to influence the desired version
of our created nodegroups. Let's restore it to what the k8s control plane
currently have.
```{important}
Per step 4 above, you can upgrade the version of the node groups maximum
three versions at once, for example from 1.25 to 1.28 directly if the
control plane's version allows it.
5. *Upgrad EKS add-ons (takes ~3*5s)*
If after one such upgrade, the node groups version is still behind
the k8s control plane, you will need to repeat the node upgrade process
until it does.
```

As documented in `eksctl`'s documentation[^1], we also need to upgrade three
EKS add-ons enabled by default, and one we have added manually.
To upgrade (unmanaged) node groups, you delete them and then add them back in. When
adding them back, make sure your cluster config's k8s version is what you
want the node groups to be added back as.

```bash
# upgrade the kube-proxy daemonset
eksctl utils update-kube-proxy --config-file=$CLUSTER_NAME.eksctl.yaml --approve
#### 5.1. Double-check current k8s version in the config

# upgrade the aws-node daemonset
eksctl utils update-aws-node --config-file=$CLUSTER_NAME.eksctl.yaml --approve
Up until this step, you should have updated the control plane's
version at least once but for maximum of three times. So you shouldn't
need to update it.

# upgrade the coredns deployment
eksctl utils update-coredns --config-file=$CLUSTER_NAME.eksctl.yaml --approve
However, it is worth double checking that the k8s version that is in
the config file is:
- not ahead of the current k8s control plane version, as this will
influence the version of the node groups.
- not **more than three minor versions** than what the version of node groups
was initially

# upgrade the aws-ebs-csi-driver addon's deployment and daemonset
eksctl update addon --config-file=$CLUSTER_NAME.eksctl.yaml
```
#### 5.2. Renaming node groups part 1: add a new core node group (like `core-b`)

````{note} Common failures
The kube-proxy deamonset's pods may fail to pull the image, to resolve this visit AWS EKS docs on [managing coredns](https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html) to identify the version to use and update the coredns deployment's container image to match it.
Rename the config file's entry for the core node group temporarily when running
this command, either from `core-a` to `core-b` or the other way around,
then create the new nodegroup.

```bash
kubectl edit daemonset coredns -n kube-system
```
````
```bash
# create a copy of the current nodegroup
eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-b"
```

#### 5.3. Renaming node groups part 2: delete all old node groups (like `core-a,nb-*,dask-*`)

Rename the core node group again in the config to its previous name,
so the old node group can be deleted with the following command,
then delete the original nodegroup.

```bash
# delete the original nodegroup
eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-a,nb-*,dask-*" --approve --drain=true
```

#### 5.4. Renaming node groups part 3: re-create all non-core node groups (like `nb-*,dask-*`)

Rename the core node group one final time in the config to its
new name, as that represents the state of the EKS cluster.

```bash
eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*"
```

### 6. Repeat steps 3,4,5 if needed

If you need to upgrade the cluster more than three minor versions,
consider repeating steps 3, 4 and 5 until the desired version is reached.

6. *Repeat steps 3 and 5 if needed*
### 7. Commit the changes to the jsonnet config file

If you upgrade k8s multiple minor versions, repeat step 3 and 5, where you
increment it one minor version at the time.
During this upgrade, the k8s version and possibly the node group name might have
been changed. Make sure you commit this changes after the upgrade is finished.

## References

[^1]: `eksctl`'s cluster upgrade documentation: <https://eksctl.io/usage/cluster-upgrade/>
[^2]: `k8s's supported version skew documentation: <https://kubernetes.io/releases/version-skew-policy/#supported-version-skew>
2 changes: 1 addition & 1 deletion eksctl/carbonplan.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ local daskNodes = [
metadata+: {
name: "carbonplanhub",
region: clusterRegion,
version: '1.24'
version: '1.27'
},
availabilityZones: masterAzs,
iam: {
Expand Down

0 comments on commit 87ff189

Please sign in to comment.