Skip to content

Commit

Permalink
Allow overriding the kubernetes.io/arch label set by the scale from z…
Browse files Browse the repository at this point in the history
…ero methods via a new cmdline arg

The architecture label in the build generic labels method of the Cluster API provider is now populated using the cpu.GetDefaultScaleFromZeroArchitecture().Name() method.

The method allows users deploying the cluster-autoscaler to define the default architecture to be used by the cluster-autoscaler for scale up from zero via the --scale-up-from-zero-default-arch flag. Amd64 is kept as a fallback for historical reasons and to maintain compatibility with current providers.

The `SystemArchitecture` type and the functions for parsing architecture names are copied over from the GCE implementation, to be generic across all the defined cloud providers.

However, the GCE implementation and the other cloud providers are currently left untouched.

The introduced changes will not take into account the case of nodes heterogeneous in architecture. The labels generation to infer properties like the cpu architecture from the node groups' features should be considered cloud-provider/user specific.

Cloud providers willing to opt into this implementation are expected to change their manager code by replacing
the cloudprovider.DefaultArch constant with the GetDefaultScaleFromZeroArchitecture function exposed by this package.
Usually, this should be done in the function responsible for generating the generic labels of the given
cloud provider's manager code (for example, buildGenericLabels(.)) as follows:

func (...) buildGenericLabels(...) (map[string]string) {
	result := make(map[string]string)
	result[apiv1.LabelArchStable] = cpu.GetDefaultScaleFromZeroArchitecture().Name()
	// ...
}
  • Loading branch information
aleskandro committed Sep 11, 2023
1 parent 3852f35 commit 513b915
Show file tree
Hide file tree
Showing 5 changed files with 222 additions and 1 deletion.
11 changes: 11 additions & 0 deletions cluster-autoscaler/cloudprovider/clusterapi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ cluster.
* [Scale from zero support](#scale-from-zero-support)
* [RBAC changes for scaling from zero](#rbac-changes-for-scaling-from-zero)
* [Pre-defined labels and taints on nodes scaled from zero](#pre-defined-labels-and-taints-on-nodes-scaled-from-zero)
* [CPU Architecture awareness for single-arch clusters](#cpu-architecture-awareness-for-single-arch-clusters)
* [Specifying a Custom Resource Group](#specifying-a-custom-resource-group)
* [Specifying a Custom Resource Version](#specifying-a-custom-resource-version)
* [Sample manifest](#sample-manifest)
Expand Down Expand Up @@ -275,6 +276,16 @@ metadata:
capacity.cluster-autoscaler.kubernetes.io/taints: "key1=value1:NoSchedule,key2=value2:NoExecute"
```

#### CPU Architecture awareness for single-arch clusters

Users of non-amd64, single-arch clusters willing to enable scale from zero
support should also set the `--scale-up-from-zero-default-arch` flag to the
architecture of the nodes they want to default the node group templates to.
The autoscaler will default to amd64 if the flag is not set, and the node
group templates will not match the nodes' architecture, particularly when
the workload triggering the scale-up uses a node affinity predicate checking
for the node's architecture.

## Specifying a Custom Resource Group

By default all Kubernetes resources consumed by the Cluster API provider will
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package clusterapi

import (
"fmt"
"k8s.io/autoscaler/cluster-autoscaler/utils/cpu"
"math/rand"

"github.com/pkg/errors"
Expand Down Expand Up @@ -369,7 +370,7 @@ func buildGenericLabels(nodeName string) map[string]string {
// TODO revisit this function and add an explanation about what these
// labels are used for, or remove them if not necessary
m := make(map[string]string)
m[corev1.LabelArchStable] = cloudprovider.DefaultArch
m[corev1.LabelArchStable] = cpu.GetDefaultScaleFromZeroArchitecture().Name()

m[corev1.LabelOSStable] = cloudprovider.DefaultOS

Expand Down
2 changes: 2 additions & 0 deletions cluster-autoscaler/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
ctx "context"
"flag"
"fmt"
"k8s.io/autoscaler/cluster-autoscaler/utils/cpu"
"net/http"
"net/url"
"os"
Expand Down Expand Up @@ -554,6 +555,7 @@ func main() {

logsapi.AddFlags(loggingConfig, pflag.CommandLine)
featureGate.AddFlag(pflag.CommandLine)
cpu.BindFlags(pflag.CommandLine)
kube_flag.InitFlags()

if err := logsapi.ValidateAndApply(loggingConfig, featureGate); err != nil {
Expand Down
108 changes: 108 additions & 0 deletions cluster-autoscaler/utils/cpu/architecture.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package cpu

import (
"github.com/spf13/pflag"
"k8s.io/klog/v2"
"sync"
)

// SystemArchitecture represents a CPU architecture (e.g., amd64, arm64, ppc64le, s390x).
// It is used to determine the default architecture to use when building the nodes templates for scaling up from zero
// by some cloud providers. This code is the same as the GCE implementation at
// https://github.com/kubernetes/autoscaler/blob/3852f352d96b8763292a9122163c1152dfedec55/cluster-autoscaler/cloudprovider/gce/templates.go#L611-L657
// which is kept to allow for a smooth transition to this package, once the GCE team is ready to use it.
type SystemArchitecture string

const (
// UnknownArch is used if the Architecture is Unknown
UnknownArch SystemArchitecture = ""
// Amd64 is used if the Architecture is x86_64
Amd64 SystemArchitecture = "amd64"
// Arm64 is used if the Architecture is ARM64
Arm64 SystemArchitecture = "arm64"
// Ppc64le is used if the Architecture is ppc64le
Ppc64le SystemArchitecture = "ppc64le"
// S390x is used if the Architecture is s390x
S390x SystemArchitecture = "s390x"
// DefaultArch should be used as a fallback if not passed by the environment via the --scale-up-from-zero-default-arch
DefaultArch = Amd64
// scaleUpFromZeroDefaultArchFlag is the flag name for the default architecture
scaleUpFromZeroDefaultArchFlag = "scale-up-from-zero-default-arch"
)

var systemArchitecture *SystemArchitecture
var systemArchitectureFlagValue string
var once sync.Once

// GetDefaultScaleFromZeroArchitecture returns the SystemArchitecture from the flag --scale-up-from-zero-default-arch
// or DefaultArch if the variable is set to an invalid value.
// Cloud providers willing to opt into this implementation are expected to change their manager code by replacing
// the cloudprovider.DefaultArch constant with the GetDefaultScaleFromZeroArchitecture function exposed by this package.
// Usually, this should be done in the function responsible for generating the generic labels of the given
// cloud provider's manager code (for example, buildGenericLabels(.)) as follows:
//
// func (...) buildGenericLabels(...) (map[string]string) {
// result := make(map[string]string)
// result[apiv1.LabelArchStable] = cpu.GetDefaultScaleFromZeroArchitecture().Name()
// // ...
// }
func GetDefaultScaleFromZeroArchitecture() SystemArchitecture {
once.Do(func() {
arch := ToSystemArchitecture(systemArchitectureFlagValue)
klog.V(5).Infof("the --%s value is set to %s (%s)", scaleUpFromZeroDefaultArchFlag, systemArchitectureFlagValue, arch.Name())
if arch == UnknownArch {
arch = DefaultArch
klog.Errorf("Unrecognized architecture '%s', falling back to %s",
systemArchitectureFlagValue, DefaultArch.Name())
}
systemArchitecture = &arch
})
return *systemArchitecture
}

// ToSystemArchitecture parses a string to SystemArchitecture. Returns UnknownArch if the string doesn't represent a
// valid architecture.
func ToSystemArchitecture(arch string) SystemArchitecture {
switch arch {
case string(Arm64):
return Arm64
case string(Amd64):
return Amd64
case string(Ppc64le):
return Ppc64le
case string(S390x):
return S390x
default:
return UnknownArch
}
}

// Name returns the string value for SystemArchitecture
func (s SystemArchitecture) Name() string {
return string(s)
}

// BindFlags binds the flags to the FlagSet set.
// Defining the flag here allows us to encapsulate the logic to parse the flag value, set the value of the
// systemArchitecture variable, and only expose the GetDefaultScaleFromZeroArchitecture function to the rest of the code.
func BindFlags(set *pflag.FlagSet) {
set.StringVar(&systemArchitectureFlagValue, scaleUpFromZeroDefaultArchFlag, DefaultArch.Name(),
"Default architecture to use when scaling up from zero. This is not supported by all the cloud providers. "+
"Check your cloud provider's documentation. Valid values: [amd64, arm64, ppc64le, s390x]")
}
99 changes: 99 additions & 0 deletions cluster-autoscaler/utils/cpu/architecture_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package cpu

import (
"fmt"
"github.com/google/go-cmp/cmp"
"sync"
"testing"
)

func TestToSystemArchitecture(t *testing.T) {
tcs := []struct {
name string
archName string
wantArch SystemArchitecture
}{
{
name: "valid architecture is converted",
archName: "amd64",
wantArch: Amd64,
},
{
name: "invalid architecture results in UnknownArchitecture",
archName: "some-arch",
wantArch: UnknownArch,
},
}
for _, tc := range tcs {
t.Run(tc.name, func(t *testing.T) {
gotArch := ToSystemArchitecture(tc.archName)
if diff := cmp.Diff(tc.wantArch, gotArch); diff != "" {
t.Errorf("ToSystemArchitecture diff (-want +got):\n%s", diff)
}
})
}
}

func TestGetSystemArchitectureFromEnvOrDefault(t *testing.T) {
amd64 := Amd64.Name()
arm64 := Arm64.Name()
wrongValue := "wrong"

tcs := []struct {
name string
envValue *string
want SystemArchitecture
}{
{
name: fmt.Sprintf("%s is set to arm64", systemArchitectureFlagValue),
envValue: &arm64,
want: Arm64,
},
{
name: fmt.Sprintf("%s is set to amd64", systemArchitectureFlagValue),
envValue: &amd64,
want: Amd64,
},
{
name: fmt.Sprintf("%s is not set", systemArchitectureFlagValue),
envValue: nil,
want: DefaultArch,
},
{
name: fmt.Sprintf("%s is set to a wrong value", systemArchitectureFlagValue),
envValue: &wrongValue,
want: DefaultArch,
},
}
for _, tc := range tcs {
t.Run(tc.name, func(t *testing.T) {
// Reset the systemArchitecture variable to nil before each test due to the lazy initialization of the variable.
systemArchitecture = nil
systemArchitectureFlagValue = ""
// Reset the once variable to its initial state before each test.
once = sync.Once{}
if tc.envValue != nil {
systemArchitectureFlagValue = *tc.envValue
}
if got := GetDefaultScaleFromZeroArchitecture(); got != tc.want {
t.Errorf("GetDefaultScaleFromZeroArchitecture() = %v, want %v", got, tc.want)
}
})
}
}

0 comments on commit 513b915

Please sign in to comment.