The cluster autoscaler on AliCloud scales worker nodes within any specified autoscaling group. It will run as a Deployment
in your cluster. This README will go over some of the necessary steps required to get the cluster autoscaler up and running.
Cluster autoscaler must run on v1.9.3 or greater.
- Standard Instancex86-Architecture,suitable for common scenes such as websites or api services.
- GPU/FPGA InstanceHeterogeneous Computing,suitable for high performance computing.
- Bare Metal InstanceBoth the elasticity of a virtual server and the high-performance and comprehensive features of a physical server.
- Spot InstanceSpot instance are on-demand instances. They are designed to reduce your ECS costs in some cases.
doc: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/auto-scaling-of-nodes
apiVersion: v1
kind: Secret
metadata:
name: cloud-config
namespace: kube-system
data:
# insert your base64 encoded Alicloud access id and key here, ensure there's no trailing newline:
# such as: echo -n "your_access_key_id" | base64
access-key-id: "<BASE64_ACCESS_KEY_ID>"
access-key-secret: "<BASE64_ACCESS_KEY_SECRET>"
region-id: "<BASE64_REGION_ID>"
{
"Version": "1",
"Statement": [
{
"Action": [
"ess:Describe*",
"ess:CreateScalingRule",
"ess:ModifyScalingGroup",
"ess:RemoveInstances",
"ess:ExecuteScalingRule",
"ess:ModifyScalingRule",
"ess:DeleteScalingRule",
"ess:DetachInstances",
"ecs:DescribeInstanceTypes"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
- create a Scaling Group in ESS(https://essnew.console.aliyun.com) with valid configurations.
- create a Scaling Configuration for this Scaling Group with valid instanceType and User Data.In User Data,you can specific the script to initialize the environment and join this node to kubernetes cluster.If your Kubernetes cluster is hosted by ACS.you can use the attach script like this.
#!/bin/sh
# The token is generated by ACS console. https://www.alibabacloud.com/help/doc-detail/64983.htm?spm=a2c63.l28256.b99.33.46395ad54ozJFq
curl http://aliacs-k8s-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/public/pkg/run/attach/[kubernetes_cluster_version]/attach_node.sh | bash -s -- --openapi-token [token] --ess true
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
priorityClassName: system-cluster-critical
serviceAccountName: admin
containers:
- image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1.2
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=alicloud
- --nodes=[min]:[max]:[ASG_ID]
imagePullPolicy: "Always"
env:
- name: ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: cloud-config
key: access-key-id
- name: ACCESS_KEY_SECRET
valueFrom:
secretKeyRef:
name: cloud-config
key: access-key-secret
- name: REGION_ID
valueFrom:
secretKeyRef:
name: cloud-config
key: region-id
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: "/etc/ssl/certs/ca-certificates.crt"
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
priorityClassName: system-cluster-critical
serviceAccountName: admin
containers:
- image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1.2
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=alicloud
- --nodes=[min]:[max]:[ASG_ID]
imagePullPolicy: "Always"
Auto Discovery is not supported in AliCloud currently.
- The
/etc/ssl/certs/ca-certificates.crt
should exist by default on your ecs instance. - By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the
--skip-nodes-with-system-pods=false
flag. - By default, cluster autoscaler will wait 10 minutes between scale down operations, you can adjust this using the
--scale-down-delay
flag. E.g.--scale-down-delay=5m
to decrease the scale down delay to 5 minutes. - If you're running multiple ASGs, the
--expander
flag supports three options:random
,most-pods
andleast-waste
.random
will expand a random ASG on scale up.most-pods
will scale up the ASG that will schedule the most amount of pods.least-waste
will expand the ASG that will waste the least amount of CPU/MEM resources. In the event of a tie, cluster-autoscaler will fall back torandom
. - If you're managing your own kubelets, they need to be started with the
--provider-id
flag.