Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: Remove AKS vmType #6133

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 4 additions & 59 deletions cluster-autoscaler/cloudprovider/azure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@ k8s.io_cluster-autoscaler_node-template_autoscaling-options_scaledownunreadytime
Cluster autoscaler supports four Kubernetes cluster options on Azure:

- [**vmss**](#vmss-deployment): Autoscale VMSS instances by setting the Azure cloud provider's `vmType` parameter to `vmss` or to an empty string. This supports clusters deployed with [aks-engine][].
- [**standard**](#standard-deployment): Autoscale VMAS instances by setting the Azure cloud provider's `vmType` parameter to `standard`. This supports clusters deployed with [aks-engine][].
- [**aks**](#aks-deployment): Supports an Azure Kubernetes Service ([AKS][]) cluster.
- [**standard**](#standard-deployment): Autoscale VMAS (Virtual Machine Availability Set) VMs by setting the Azure cloud provider's `vmType` parameter to `standard`. This supports clusters deployed with [aks-engine][].

> **_NOTE_**: only the `vmss` option supports scaling down to zero nodes.

Expand Down Expand Up @@ -250,74 +249,21 @@ To run a cluster autoscaler pod with Azure managed service identity (MSI), use [

> **_WARNING_**: Cluster autoscaler depends on user-provided deployment parameters to provision new nodes. After upgrading your Kubernetes cluster, cluster autoscaler must also be redeployed with new parameters to prevent provisioning nodes with an old version.

### AKS deployment
## AKS Autoscaler

#### AKS + VMSS

Autoscaling VM scale sets with AKS is supported for Kubernetes v1.12.4 and later. The option to enable cluster autoscaler is available in the [Azure Portal][] or with the [Azure CLI][]:
Node Pool Autoscaling is a first class feature of your AKS cluster. The option to enable cluster autoscaler is available in the [Azure Portal][] or with the [Azure CLI][]:

```sh
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--kubernetes-version 1.13.5 \
--kubernetes-version 1.25.11 \
--node-count 1 \
--enable-vmss \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 3
```

#### AKS + Availability Set

The CLI based deployment only support VMSS and manual deployment is needed if availability set is used.

Prerequisites:

- Get Azure credentials from the [**Permissions**](#permissions) step above.
- Get the cluster name with the `az aks list` command.
- Get the name of a node pool from the value of the label **agentpool**

```sh
kubectl get nodes --show-labels
```

Make a copy of [cluster-autoscaler-aks.yaml](examples/cluster-autoscaler-aks.yaml). Fill in the placeholder values for
the `cluster-autoscaler-azure` secret data by base64-encoding each of your Azure credential fields.

- ClientID: `<base64-encoded-client-id>`
- ClientSecret: `<base64-encoded-client-secret>`
- ResourceGroup: `<base64-encoded-resource-group>` (Note: ResourceGroup is case-sensitive)
- SubscriptionID: `<base64-encoded-subscription-id>`
- TenantID: `<base64-encoded-tenant-id>`
- ClusterName: `<base64-encoded-clustername>`
- NodeResourceGroup: `<base64-encoded-node-resource-group>` (Note: node resource group is not resource group and can be obtained in the corresponding label of the nodepool)

> **_NOTE_**: Use a command such as `echo $CLIENT_ID | base64` to encode each of the fields above.

In the `cluster-autoscaler` spec, find the `image:` field and replace `{{ ca_version }}` with a specific cluster autoscaler release.

Below that, in the `command:` section, update the `--nodes=` arguments to reference your node limits and node pool name. For example, if node pool "k8s-nodepool-1" should scale from 1 to 10 nodes:

```yaml
- --nodes=1:10:k8s-nodepool-1
```

or to autoscale multiple VM scale sets:

```yaml
- --nodes=1:10:k8s-nodepool-1
- --nodes=1:10:k8s-nodepool-2
```

Then deploy cluster-autoscaler by running

```sh
kubectl create -f cluster-autoscaler-aks.yaml
```

To deploy in AKS with `Helm 3`, please refer to [helm installation tutorial][].

Please see the [AKS autoscaler documentation][] for details.

## Rate limit and back-off retries
Expand All @@ -339,7 +285,6 @@ The new version of [Azure client][] supports rate limit and back-off retries whe

> **_NOTE_**: * These rate limit configs can be set per-client. Customizing `QPS` and `Bucket` through environment variables per client is not supported.

[AKS]: https://docs.microsoft.com/azure/aks/
[AKS autoscaler documentation]: https://docs.microsoft.com/azure/aks/autoscaler
[aks-engine]: https://github.com/Azure/aks-engine
[Azure CLI]: https://docs.microsoft.com/cli/azure/install-azure-cli
Expand Down
4 changes: 2 additions & 2 deletions cluster-autoscaler/cloudprovider/azure/azure_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ var (
// azureCache is used for caching cluster resources state.
//
// It is needed to:
// - keep track of node groups (AKS, VM and VMSS types) in the cluster,
// - keep track of node groups (VM and VMSS types) in the cluster,
// - keep track of instances and which node group they belong to,
// - limit repetitive Azure API calls.
type azureCache struct {
Expand Down Expand Up @@ -174,7 +174,7 @@ func (m *azureCache) fetchAzureResources() error {
} else {
return err
}
case vmTypeStandard, vmTypeAKS:
case vmTypeStandard:
// List all VMs in the RG.
vmResult, err := m.fetchVirtualMachines()
if err == nil {
Expand Down
7 changes: 0 additions & 7 deletions cluster-autoscaler/cloudprovider/azure/azure_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ import (

klog "k8s.io/klog/v2"

"sigs.k8s.io/cloud-provider-azure/pkg/azureclients/containerserviceclient"
"sigs.k8s.io/cloud-provider-azure/pkg/azureclients/diskclient"
"sigs.k8s.io/cloud-provider-azure/pkg/azureclients/interfaceclient"
"sigs.k8s.io/cloud-provider-azure/pkg/azureclients/storageaccountclient"
Expand Down Expand Up @@ -151,7 +150,6 @@ type azClient struct {
interfacesClient interfaceclient.Interface
disksClient diskclient.Interface
storageAccountsClient storageaccountclient.Interface
managedKubernetesServicesClient containerserviceclient.Interface
skuClient compute.ResourceSkusClient
}

Expand Down Expand Up @@ -274,10 +272,6 @@ func newAzClient(cfg *Config, env *azure.Environment) (*azClient, error) {
disksClient := diskclient.New(diskClientConfig)
klog.V(5).Infof("Created disks client with authorizer: %v", disksClient)

aksClientConfig := azClientConfig.WithRateLimiter(cfg.KubernetesServiceRateLimit)
kubernetesServicesClient := containerserviceclient.New(aksClientConfig)
klog.V(5).Infof("Created kubernetes services client with authorizer: %v", kubernetesServicesClient)

// Reference on why selecting ResourceManagerEndpoint as baseURI -
// https://github.com/Azure/go-autorest/blob/main/autorest/azure/environments.go
skuClient := compute.NewResourceSkusClientWithBaseURI(azClientConfig.ResourceManagerEndpoint, cfg.SubscriptionID)
Expand All @@ -292,7 +286,6 @@ func newAzClient(cfg *Config, env *azure.Environment) (*azClient, error) {
deploymentsClient: deploymentsClient,
virtualMachinesClient: virtualMachinesClient,
storageAccountsClient: storageAccountsClient,
managedKubernetesServicesClient: kubernetesServicesClient,
skuClient: skuClient,
}, nil
}
16 changes: 0 additions & 16 deletions cluster-autoscaler/cloudprovider/azure/azure_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,11 +111,6 @@ type Config struct {
Deployment string `json:"deployment" yaml:"deployment"`
DeploymentParameters map[string]interface{} `json:"deploymentParameters" yaml:"deploymentParameters"`

//Configs only for AKS
ClusterName string `json:"clusterName" yaml:"clusterName"`
//Config only for AKS
NodeResourceGroup string `json:"nodeResourceGroup" yaml:"nodeResourceGroup"`

// VMSS metadata cache TTL in seconds, only applies for vmss type
VmssCacheTTL int64 `json:"vmssCacheTTL" yaml:"vmssCacheTTL"`

Expand Down Expand Up @@ -174,8 +169,6 @@ func BuildAzureConfig(configReader io.Reader) (*Config, error) {
cfg.AADClientCertPath = os.Getenv("ARM_CLIENT_CERT_PATH")
cfg.AADClientCertPassword = os.Getenv("ARM_CLIENT_CERT_PASSWORD")
cfg.Deployment = os.Getenv("ARM_DEPLOYMENT")
cfg.ClusterName = os.Getenv("AZURE_CLUSTER_NAME")
cfg.NodeResourceGroup = os.Getenv("AZURE_NODE_RESOURCE_GROUP")

subscriptionID, err := getSubscriptionIdFromInstanceMetadata()
if err != nil {
Expand Down Expand Up @@ -474,8 +467,6 @@ func (cfg *Config) TrimSpace() {
cfg.AADClientCertPath = strings.TrimSpace(cfg.AADClientCertPath)
cfg.AADClientCertPassword = strings.TrimSpace(cfg.AADClientCertPassword)
cfg.Deployment = strings.TrimSpace(cfg.Deployment)
cfg.ClusterName = strings.TrimSpace(cfg.ClusterName)
cfg.NodeResourceGroup = strings.TrimSpace(cfg.NodeResourceGroup)
}

func (cfg *Config) validate() error {
Expand All @@ -493,13 +484,6 @@ func (cfg *Config) validate() error {
}
}

if cfg.VMType == vmTypeAKS {
// Cluster name is a mandatory param to proceed.
if cfg.ClusterName == "" {
return fmt.Errorf("cluster name not set for type %+v", cfg.VMType)
}
}

if cfg.SubscriptionID == "" {
return fmt.Errorf("subscription ID not set")
}
Expand Down
Loading
Loading