-
Notifications
You must be signed in to change notification settings - Fork 295
Moving ETCD and controllers outside CloudFormation Nested Stacks #1112
Comments
@camilb i've been maintaining and testing several k8s clusters using kube-aws the last year and i do agree with your idea, it will be good that we can pin specific k8s version and amiID for the control plane and etcd and another one for workers. That will tremendously help upgrades and updates |
AFAIK |
@camilb Thanks a lot for sharing your hard-won experience. Yes - I believe this should be "fixed" somehow.
I agree DesiredCapacity should not be used anywhere. Actually, any kube-aws-generated stack-template does not include DesiredCapacity since #142 as far as I can remember.
This is really worth noting! So, we have two things to be considered separately, right?
For 1., we could split the root stack into three as you've suggested. A downside of it would be that we may need an another workflow-engine like system to reliably update etcd, then controller, and finally worker nodes in this specific order? An alternative approach would be emitting an validation error when an user tried to update the whole cluster in an one-shot. For 2., we need some domain-specific logics to determine an update is backward-incompatible or not? Otherwise, we can just give up automating it and just allow users to disable rollback via a command-line flag or a specific setting in cluster.yaml. Regarding amiId, yea - as @redbaron noted, we can configure it per pool. Would it be a matter of documentation? |
Btw, my take on this problem is improving kube-aws by doing all of the below:
What are your thoughts? Thanks! |
Hi @mumoshu, thanks for you fast response. In most of the upgrades I prefer not touching the ETCD nodes. Since 07/2016, when I started using kube-aws I didn't have any issue with ETCD, on any cluster and I prefer to update it less ofen than Kubernetes". Something like:
Also maybe we can add a bool in |
thanks @redbaron i didn't noticed it 🍺 |
@camilb Thanks! Could you also mind sharing your thought on how we can trigger a roll-back when the new worker nodes are failed actually due to the preceding updates to controller nodes? Do you just leave already-updated controller nodes as-is and re-run the updates to worker nodes (with appropriate changes in worker config so that new workers can adapt to the recent changes in controller nodes)? |
@mumoshu When If the workers are in a separate stack, we can rollback the workers only, as we can use controllers with greater version (I remember up to 3 releases) until we fix the issues and update the workers stack again. |
@camilb Thanks! I believe I understood that part. Excuse me but what I wanted to sync up was that:
In that case, as the stack for controller nodes are successfully updated, you can't trigger a manual rollback, right? |
@mumoshu I understand, unfortunately the only option in that case is to fix the controllers stack and update it again. But I think it can be observed as soon as the controllers finish the update. In that case the existing workers will start to fail and if the stacks can be upgraded in two separate commands, like on GKE where you have the option to update the controllers, all the workers, or just a nodepool |
@camilb Thanks for the confirmation. I'd like to achieve a similar u/x at least. At any circumstance, there shouldnt be a suprise such as "why my etcd/controller nodes are being updated even though my changes in cluster.yaml are solely for workers?". |
My updated suggestions for improvements:
Use-cases:
In any case, a spare cluster for faster disaster recovery or blue-green cluster deployment(my preference) is recommended. Down-sides:
|
Perhaps we need to move VPC and subnet definitions from controlplane stack to the root stack? |
@camilb @Fsero How about the above idea? For me, it seemed to provide a smoother migration/development path while achieving your original goal? |
Sounds reasonable to align to the UX of gcloud, we use that as well and it seems to work well. I’m not keen on the overrides.json as it splits the config in two. I’d prefer to pin the AMI on cluster.yaml generation or in source and then add a command to update it using the existing code. i.e. less surprise updates but slightly more surprise if a user is expecting everything to auto update. |
@mumoshu From my point of view your proposal it's enough to avoid CF issues in the future, in the last one and a half year didn't have any major issues with
I don't see any downside by adding
|
@c-knowles Thanks for your comment! Yes, it would work too as long as we give up kube-aws itself to automatically set a latest k8s version and a latest AMI ID at runtime of So, we could also enhance kube-aws here by:
|
In case someone is actually relying on |
Implementation note: It would be possible to do detect which stack(s) are being updated by creating a cfn changeset and inspecting the result, so that we can emit a validation error accordingly. |
What about something like:
Not sure about including k8s and etcd versions as then what about container image versions etc? There's usually quite a bit more to updating than just those versions and a lot of different incompatibilities but the AMI is a little removed from that other than the docker version. |
Hi @c-knowles!
Regarding etcd and k8s versions, I wanted to include them for auto populations as they do result in node replacement once you update the kube-aws binary, even no cluster.yaml or stack templates are changed.
How would you like kube-aws to achieve it? |
Updated proposal for improving kube-aws for less surprises while upgrading
Use-cases:
In any case, a spare cluster for faster disaster recovery or blue-green cluster deployment(my preference) is recommended. Down-sides:
|
Interesting discussion and some good idea. The ideas that jumped out as great to me were:
|
@camilb I'd love to hear your experience with upgrading k8s versions in Are you able to deploy 1.9 clusters with Are you able to upgrade 1.8 clusters to 1.9 with Do you just update the hypercube version and go for it? Any tips or gotcha's you can share or add to the docs somewhere? |
Thanks for your feedback! |
Do we need backward-compatibility for this, e.g. a flag to split stacks or not? Also, running I'm ok with recreating every cluster cuz I consider my clusters cattles rather than pets. How about you, everyone? |
@whereisaaron I'm unsing the master branch to build kube-aws most of the times. Currently all of my clusters are running the latest changes in master. Some were upgraded from 1.8.x to 1.9.1 and the largest one from 1.7.8 to 1.9.1. |
@mumoshu I'm planning to migrate existing clusters to new ones, without updating. |
@mumoshu when you have a large collection of heterogeneous applications in a cluster, the clusters may be cattle, but the applications on them are like pet ticks on the cow you have to locate and transplant to the next beast 😄 I hopefully at least minor (x.y.z) version in-place upgrades will be possible/supported, again, trying to match the ease of upgrade GKE and the like give you. Separate stacks does make this a little easier, since node pools and controllers are pretty much cattle within a cluster. |
I'll be merging #1233 and cut v0.9.11-rc.1, once v0.9.10 is released. Any comments, opinions, etc? 😃 |
@mumoshu Just one comment so far - is it possible to split the worker stacks with this implementation. ie. If I have a multiple nodepool environment, I'd like to roll the nodepool stacks independently |
@kevtaylor It won't be so hard to implement a naive support for that like Anyway, may I ask you about your exact use-case for that? I guess that if you had modified the first node pool only in your cluster.yaml, Oh, maybe does it relate with |
Thank you so much for the comment, anyway! |
So the types of things we do are split node pools into separate groups We then add tolerations to certain pod types which then favour that nodepool We would still use a consistent AMI across the nodepools But we may want to say, change the instance type of Group A, and then just roll that pool - or if we wanted to change an AMI id - just test that on a given stack first etc. |
@kevtaylor Thanks for the explanation! It is now clearer to me. How about Would it sound good? |
@mumoshu That looks spot on to me |
@kevtaylor Thanks for your confirmation 👍 Just implemented it into #1233 |
#1233) * Prompt before updating * feat: Support updating a subset of cfn stacks only * Vendor changes for the stack subset update feature * Rename some func-scoped variables for clarity * feat: Extract etcd and network stack for more fine-grained cluster update * feat: Ability to specify just one or more node pools for updates * fix: Make update work with `--targets all` Ref #1112
Hi, I just rolled the lasted master into an existing 0.9.10 cluster and it did something strange with the etcd's - it created 3 new ones and left the old ones. I think we are going to have to think about how to make this migrate from old to new separated stacks. |
I guess as it's a separate stack now nothing is managing that migration. Are we still advising to do a blue/green cluster switchover? We should also update the CLI reference in the docs. |
I don't think we should be allowing this to roll into an existing cluster if the result is going to be to effectively remove all the existing state. I think that we need a migration path or a hard stop that prevents users from accidentally wiping their existing clusters. |
I understand your concern. How about preventing `kube-aws update` when
theres no network stack? That's the quickest guard that I can think of.
2018年6月4日(月) 19:12 Dave McCormick <[email protected]>:
… I don't think we should be allowing this to roll into an existing cluster
if the result is going to be to effectively remove all the existing state.
I think that we need a migration path or a hard stop that prevents users
from accidentally wiping their existing clusters.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABV-e5sHPKLhEQVisfmQ8KnTN8i1rbGks5t5QgggaJpZM4ReSQn>
.
|
That sounds like a very good quick safety change and could give us some time to put together a more comprehensive solution! :) Regarding a migration strategy, I'm not very familiar with Cloud Formation, but have noticed three interesting looking tags, e.g.
Does it sound plausible to write a function in kube-aws that looks for legacy etcd resources, instances, volumes (etc.) and re-tag them with the with the stack-id of the new stack? Will this make cloud-formation treat these as members of the new stack? I have no idea if this would work but if it does it might give us a clean migration path. Anyone with more CF experience can give me a steer whether this solution is worth-while spending some time to try out and test it? |
Thinking some more about it, I don't think that my suggestion above will work - given that the we are creating a new etcd stack rather than updating it perhaps we could expect an error about clashing with existing resources. There is clearly some complexity here regarding how CloudFormation works and my causual investigation so far hasn't thrown much up in the way of people migrating resources across different stacks. |
The first problem that I find when trying to update a 0.9.9 cluster to the new Etcd stack is that the new ETCD0 fails to send it's cfn-signal for some reason. I could never fathom how we get over this in a new clean cluster so I have created a PR that will bring the ETCD's up in parrallel upon new creation of the stack and thus avoid this problem #1357 |
Oh, did we miss implementing |
Hiyah, I didn't disable rollback as part of the etcd migration code because generally if the etcd migration failed then rolling back is a good thing. If the controllers fail to come up then this should also trigger a roll back before the old etcds are deleted. Is this in relation to a desired feature or an issue upgradings? |
@davidmccormick As far as I remember, |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @mumoshu, I want to discuss some possible changes in CloudFormation especially regarding the NestedStacks and DesiredCapacity.
Had all the clusters upgraded to 1.9.1 last week. All went fine, except on the last cluster, the last NestedStack (a nodepool) failed to upgrade and started a RollBack.
The upgrade failed because a new instance was added and didn't respond by any form (no ssh, no ping, nothing in AWS console "Get System Logs"). Since CF is expecting
a signal from the launched instances, the quickest solution was to terminate the instance from console but it didn't launched a new one, so I increased the ASG capacity
to be able to receive the signal before a timeout but right after that I received a error in CloudFormation:
New SetDesiredCapacity value 20 is below min value 21 for the AutoScalingGroup.
And the RollBack started, rolling back all the node pools, controllers and ETCD for ~2h. It's not the first time when something like this happens, once the CloudFormation displayed a message that a new instance was added but didn't show in console so I had to change the ASG manually to avoid signal timeout.
On our CF stacks we never use
DesiredCapacity
for ASGs because they can be resized due to a traffic increase and CF will rollback the update if the size differs.Then, regarding the NestedStacks, I think ETCD and controllers should be in separated stacks. If the update fails on workers, we should not rollback the ETCD and controllers that were successfully upgraded. Sometimes a rollback will not work for ETCD or controllers with specific versions. For example on test clusters was unable to revert from ETCD 3.2.X to ETCD 3.0.X or from Kubernetes 1.9.X to 1.7.X and this require repeating the upgrade. Workers are quite safe to rollback.
Also, the RollBack process for NestedStacks takes a very long time for large clusters with multiple node pools. I know that using Nested Stacks it's much cleaner, but rolling back everything when we have updates in ETCD or Kubernetes version requires repeating the upgrade to recover the cluster, causing a very long downtime.
I' don't know how we can remove the
DesiredCapacity
from CF right now, but at least maybe we can move out ETCD and Controllers from Nested Stacks. What do you think?The text was updated successfully, but these errors were encountered: