-
Notifications
You must be signed in to change notification settings - Fork 295
0.11.x migration from existing clusters without losing state #1380
0.11.x migration from existing clusters without losing state #1380
Conversation
…rnetes.Networking.SelfHosting is Enabled. This is to break the dependency that exists on the nodepool stacks on etcd stack resources.
…rnetes.Networking.SelfHosting is Enabled. (kubernetes-retired#1367) This is to break the dependency that exists on the nodepool stacks on etcd stack resources. Ref kubernetes-retired#1370
…py state from existing etcd over to the new ones during a migration.
…-aws into 0.11.x/remove-etcd-dependency-on-nodepools-when-selfhosted-networking-enabled
@@ -158,6 +158,57 @@ | |||
} | |||
} | |||
}, | |||
"SecurityGroupWorker": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This security group needed returning to the control plane. We can remove it again in later releases but without it in the updated control plane stack will throw an error about it being in use by the nodepools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Fine with reviving this then.
// Wish we had something like reference counting to keep AWS resources only while they're used 😆
controlplaneconfig.StackTemplateOptions | ||
UserDataEtcd model.UserData | ||
ExtraCfnResources map[string]interface{} | ||
model.EtcdExistingState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and func NewStackConfig are the only changes from the controlplane version.
@@ -207,6 +207,63 @@ coreos: | |||
[Install] | |||
WantedBy=multi-user.target | |||
{{end}} | |||
{{ if .EtcdMigrationEnabled -}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration units, path unit to trigger the import
@@ -504,17 +504,18 @@ func (c clusterImpl) LegacyUpdate(targets OperationTargets) (string, error) { | |||
|
|||
func (c clusterImpl) update(cfSvc *cloudformation.CloudFormation, targets OperationTargets) (string, error) { | |||
|
|||
// Look at existing state of cloud formation and stacks to determine if we need to take special measures in migrating our etcd | |||
// clusters from the control plane stack to their own Etcd stack. | |||
exists, err := cfnstack.NestedStackExists(cfSvc, c.controlPlane.ClusterName, naming.FromStackToCfnResource(c.etcd.Etcd.LogicalName())) | |||
if err != nil { | |||
logger.Errorf("please check your AWS credentials/permissions") | |||
return "", fmt.Errorf("can't lookup AWS CloudFormation stacks: %s", err) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only fail-fast if we don't have SelfHosting enabled - otherwise take this as our cue to migrate instead!
… etcdconfig which depends on controlplane config 2) Allow mocks to return nil response and not crash lookupExistingEtcdEndpoints
Fixes issue #1112 |
"Description" : "The security group assigned to worker nodes", | ||
"Value" : { "Ref" : "SecurityGroupWorker" }, | ||
"Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-WorkerSecurityGroup" }} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This corresponds to https://github.com/kubernetes-incubator/kube-aws/pull/1380/files#r199216521
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took some time for me to get but - you did an excellent work! LGTM.
This PR is for the master 0.11.x candidate branch and is intended to allow a smoother migration for existing users with 0.10.x clusters which do not have the new separate etcd stack. When first testing the 0.11.x code I found that the process would always fail and roll-back due to cloud-formation dependencies, but once these were cleaned up and worked around then the new etcd cluster would come up empty - effectively wiping the state of the existing cluster in the process of the upgrade.
TL/DR: Upgrades from legacy etcds by importing a copy of all their keys and then allows them to be destroyed. ALSO you are expected to have updated to the 0.10.x migration release first otherwise the migration will fail because cloud-formation dependencies or your new etcd servers won't be able to connect to the old ones!
The approach for the upgrade is this: -
8 etcdadm has been enhanced to provide extra 'cluster-is-healthy', 'member-is-leader', 'migration-export-kube-state' and 'migration-import-kube-state' commands.
The bulk of changes are in using knowlege of existing state when templating assets, such as cloud-configs and stack templates. I wasn't happy putting the state in the config package because these are not settings that a user can select, so I ended up working things out at the cluster package level and then looking for ways to included my extra state information into the templating contexts. I think that perhaps some more thought and re-factoring could be applied to better model the role of config and existing state when bringing up a cluster.