This proposal addresses a GitOps-compatible subcommand of eksctl
called
apply
.
Status: WIP
The purpose of apply
is to subsume the various imperative commands
eksctl
provides now into one apply
command that reconciles the actual state
of an EKS cluster with the intended state as specified in the eksctl
config file.
It should be GitOpsable aka GitOps-compatible, i.e. usuable as part of a GitOps
pipeline.
This proposal is only concerned with the apply
subcommand. Existing behavior
and commands are unaffected. Continuing to support them shouldn't complicate
the implementation too much as they are more or less subsets of
operations performed by apply
.
eksctl
supplies users with imperative commands for clusters and cluster
resources that allow them to take explicit actions, like create
or delete
.
We also encourage the use of a config file in which the desired state of the
cluster is described. Still, users are required to figure out which imperative
steps to take to reconcile the cluster with the desired state.
The missing capability is to give eksctl
the ability to do this.
Conceptually, eksctl
would gather all of the information it can about the
current state of the cluster in order to build a ClusterConfig
describing
the real-world cluster and then diff
this config with the user-provided
ClusterConfig
. The diff
is used to setup a plan of modification to the
cluster and resources.
This proposal also opens the floor for discussion around non-straightforward questions around reconciliation.
When looking at a list of nodegroups in a cluster, it's not clear how to handle deletion.
- Is the given list intended to be a complete list of nodegroups and no others should exist? Are they automatically deleted?
- How do we handle non-eksctl managed resources?
In order to change some properties, entire resource will have to be recreated. This should be explicit to the user.
Because it's impractical to introduce complete support for reconciliation at once, we need to instead gradually support properties and resources of a cluster.
This leads to the question of whether the current ClusterConfig
is a good
structure for a config meant for reconciliation.
Introduce a subcommand apply
which has only a config file argument.
apply
would reconcile desired with real-world state
as described in the motivation.
The exact details per resource will potentially differ depending on
the resources and won't be enumerated here.
apply
is put behind the explicit "experimental" flag (cfenable profile/repo
)- Take full ownership of
eksctl
-managed resources (i.e. delete missing resources by default) - Options for non-
eksctl
-managed resources as well as resources where ownership is unknowable are covered below - Changing immutable fields through recreation is the default
- We gradually expand
apply
support to different parts of the cluster - As we expand
apply
support, we reevaluate and update the config structure as necessary in a new API version.
Some further discussion/justification follows.
We make it explicit that apply
is experimental, very open to feedback and
subject to change.
Ownership comes into play most importantly with deletion.
Given the config:
nodeGroups:
- name: ng-2
desiredCapacity: 1
we discover an existing nodegroup ng-1
with the eksctl.io/owned: true
tag (for example),
do we delete it?
The proposed answer is yes, it should be deleted. With apply
we assume that all
previous changes initiated by eksctl
were initiated through the same config
file/reconciliation process and that this nodegroup must have been
deleted from this file with the intention of deleting it from the cluster.
There is precedent for adding the ability to specify resources in a
non-authoritative list. For example, the google
terraform provider supports IAM with 3 different levels of authoritativeness.
Initial apply
support will focus on authoritative resource specification.
Some resources may not be taggable as owned by eksctl
.
If there weren't a tags
field on addons, for example. eksctl
-created
addons would then be indistinguishable from addons created in the AWS console.
There are at least 2 ways to deal with this case:
With this option, we assume that eksctl
owns the set of such resources and
delete any not listed in the config.
Storing some kind of state/config "somewhere" is another option and has been discussed in eksctl-io#642 �
An ignore
flag (cf terraform
's prevent_destroy
) in the schema for all objects
could be added to tell eksctl
about such objects and prevent deletions
and potentially skip reconciliation.
The user would prefix their list:
nodeGroups:
- name: ng-1
ignore: true
- name: ng-2
desiredCapacity: 1
The behavior for resources created outside of eksctl
could also
be toggleable by settings in the config/flags on apply
.
Proposed is creating a new API version, starting with v1alpha6
, where, as we add
apply
support for cluster/nodegroup properties, we allow for aggressive reevaluation
of the existing config structure that would hold these newly supported properties/aspects
of the cluster/nodegroup.
If we decide, for example, to add support for reconciling tags, we might
decide it fits better as a top level key. We would then, with v1alpha6
, remove
tags
under metadata
and make it a top level field. It should be possible to
automatically rewrite the previous version into the current version.
- Ensure that the config makes sense in a world centered around reconciliation.
- Auto upgrade makes it easier for users to use an existing config
- "Parse, don't validate": the code that uses the config should be insulated from how we serialize/deserialize the config.
- What, if anything, about the VPC is owned by
eksctl
and under what circumstances?- What does it mean if
the user specifies a VPC not created by
eksctl
buteksctl
finds a discrepancy between the actual state and the config terraform
's answer here is explicitness:resource
vsdata
.
- What does it mean if
the user specifies a VPC not created by