Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intro carpenter #2425

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Intro carpenter #2425

wants to merge 8 commits into from

Conversation

pipo02mix
Copy link
Contributor

What this PR does / why we need it

Towards https://github.com/giantswarm/giantswarm/issues/29503

Things to check/remember before submitting

  • If it's one of your first contributions, make sure you've read the Contributing Guidelines.
  • Bump last_review_date in the front matter header of the pages you've touched.

@pipo02mix pipo02mix requested a review from a team as a code owner December 13, 2024 15:04
@pipo02mix pipo02mix marked this pull request as draft December 13, 2024 15:04
@pipo02mix pipo02mix self-assigned this Dec 13, 2024
Copy link
Contributor

This PR moves/renames or deletes some files. Please make sure to

  • maintain references (also important for images)
  • Maintain aliases in the front matter of moved markdown files

Copy link
Contributor

github-actions bot commented Dec 17, 2024

Hugo yielded some warnings. Please check whether they require action.

WARN  Template shortcodes/autoscaling_supported_versions.html is unused, source file /home/runner/work/docs/docs/src/layouts/shortcodes/autoscaling_supported_versions.html

@pipo02mix pipo02mix marked this pull request as ready for review December 17, 2024 15:36
To avoid collisions between both, the cluster autoscaler is configured to have a lower priority than `Karpenter`, so it will react only after a pod is on `Pending` for a while (default 5 minutes).

## Configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@T-Kukawka does it come installed in CAPA WC by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't come pre-installed. And therefore, we should clarify at the top of the article that it's a custom addition at the moment, while cluster-autoscaler is built-in and works fine out of the box.

@pipo02mix pipo02mix requested review from a team and marians December 17, 2024 15:37

At Giant Swarm, your workload clusters run with [cluster autoscaler](https://github.com/kubernetes/autoscaler) and [`Karpenter`](https://karpenter.sh/) to reach optimal scaling for your workloads and keeping the costs at minimum. This tutorial will guide you through the configuration and management of both.

The cluster autoscaler is responsible for scaling the number of nodes on the different node pools of your workload cluster. It's triggered by not schedule pods, pods in `Pending` state, making the controller increase the number of desired nodes in the node pool. Indeed it modifies the `AutoScalingGroup` to reflect the new desired capacity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The cluster autoscaler is responsible for scaling the number of nodes on the different node pools of your workload cluster. It's triggered by not schedule pods, pods in `Pending` state, making the controller increase the number of desired nodes in the node pool. Indeed it modifies the `AutoScalingGroup` to reflect the new desired capacity.
The cluster autoscaler is responsible for scaling the number of nodes on the different node pools of your workload cluster. It's triggered by not scheduled pods, pods in `Pending` state, making the controller increase the number of desired nodes in the node pool. Indeed it modifies the `AutoScalingGroup` to reflect the new desired capacity.


The cluster autoscaler is responsible for scaling the number of nodes on the different node pools of your workload cluster. It's triggered by not schedule pods, pods in `Pending` state, making the controller increase the number of desired nodes in the node pool. Indeed it modifies the `AutoScalingGroup` to reflect the new desired capacity.

Instead, `Karpenter` relies on the Kubernetes events to scale up or down the number of nodes in the cluster. It's select from a suite of instance types defined in a special `Provisioner` resources to match the workload requirements and can be configured to use spot instances to save costs. It's faster and more efficient than the cluster autoscaler, but does not operate well with base on-demand instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say it doesn't operate well with base on-demand instances?


Our recommendation for the autoscaling configuration is to set two different profiles. One will target `Spot` compute and the other `On-Demand` instances. The `Spot` profile will have a higher weight to be prioritized over the `On-Demand` profile. And the `on-demand` profile will ensure that the cluster has a base capacity to handle the main workloads.

First, let's dive in what is a `Provisioner` custom resource to understand how to configure it. There are a set of parameters to help you define how the nodes should be provisioned:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
First, let's dive in what is a `Provisioner` custom resource to understand how to configure it. There are a set of parameters to help you define how the nodes should be provisioned:
First, let's dive into what a `Provisioner` custom resource is to understand how to configure it. There are a set of parameters to help you define how the nodes should be provisioned:

First, let's dive in what is a `Provisioner` custom resource to understand how to configure it. There are a set of parameters to help you define how the nodes should be provisioned:

- **labels**: Used to select which nodes should be managed by the provisioner.
- **limits**: Define the resources limits for the nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **limits**: Define the resources limits for the nodes.
- **limits**: Lets you set limits on the total CPU and Memory that can be used by the node pool, effectively stopping further node provisioning when those limits have been reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants