Moving ETCD and controllers outside CloudFormation Nested Stacks #1112

camilb · 2018-01-15T11:35:55Z

Hi @mumoshu, I want to discuss some possible changes in CloudFormation especially regarding the NestedStacks and DesiredCapacity.

Had all the clusters upgraded to 1.9.1 last week. All went fine, except on the last cluster, the last NestedStack (a nodepool) failed to upgrade and started a RollBack.
The upgrade failed because a new instance was added and didn't respond by any form (no ssh, no ping, nothing in AWS console "Get System Logs"). Since CF is expecting
a signal from the launched instances, the quickest solution was to terminate the instance from console but it didn't launched a new one, so I increased the ASG capacity
to be able to receive the signal before a timeout but right after that I received a error in CloudFormation:
New SetDesiredCapacity value 20 is below min value 21 for the AutoScalingGroup.
And the RollBack started, rolling back all the node pools, controllers and ETCD for ~2h. It's not the first time when something like this happens, once the CloudFormation displayed a message that a new instance was added but didn't show in console so I had to change the ASG manually to avoid signal timeout.

On our CF stacks we never use DesiredCapacity for ASGs because they can be resized due to a traffic increase and CF will rollback the update if the size differs.
Then, regarding the NestedStacks, I think ETCD and controllers should be in separated stacks. If the update fails on workers, we should not rollback the ETCD and controllers that were successfully upgraded. Sometimes a rollback will not work for ETCD or controllers with specific versions. For example on test clusters was unable to revert from ETCD 3.2.X to ETCD 3.0.X or from Kubernetes 1.9.X to 1.7.X and this require repeating the upgrade. Workers are quite safe to rollback.

Also, the RollBack process for NestedStacks takes a very long time for large clusters with multiple node pools. I know that using Nested Stacks it's much cleaner, but rolling back everything when we have updates in ETCD or Kubernetes version requires repeating the upgrade to recover the cluster, causing a very long downtime.

I' don't know how we can remove the DesiredCapacity from CF right now, but at least maybe we can move out ETCD and Controllers from Nested Stacks. What do you think?

The text was updated successfully, but these errors were encountered:

Fsero · 2018-01-15T12:43:13Z

@camilb i've been maintaining and testing several k8s clusters using kube-aws the last year and i do agree with your idea, it will be good that we can pin specific k8s version and amiID for the control plane and etcd and another one for workers. That will tremendously help upgrades and updates

redbaron · 2018-01-15T13:11:48Z

AFAIK amiId can be pinned per each node pool already

mumoshu · 2018-01-15T13:25:14Z

@camilb Thanks a lot for sharing your hard-won experience.

Yes - I believe this should be "fixed" somehow.

On our CF stacks we never use DesiredCapacity for ASGs because they can be resized due to a traffic increase and CF will rollback the update if the size differs.

I agree DesiredCapacity should not be used anywhere. Actually, any kube-aws-generated stack-template does not include DesiredCapacity since #142 as far as I can remember.

Sometimes a rollback will not work for ETCD or controllers with specific versions. For example on test clusters was unable to revert from ETCD 3.2.X to ETCD 3.0.X or from Kubernetes 1.9.X to 1.7.X and this require repeating the upgrade.

This is really worth noting!

So, we have two things to be considered separately, right?

How to prevent uselessly rolling-back updates to etcd and controller nodes, when the cause of the failure was worker nodes' config or AWS(possibly transient failures).
How to prevent executing a rolling-back a upgrade which involves backward-incompatible change(s), especially for controller and etcd nodes.

For 1., we could split the root stack into three as you've suggested. A downside of it would be that we may need an another workflow-engine like system to reliably update etcd, then controller, and finally worker nodes in this specific order?
Otherwise, if we don't need reliability here, we could just do it in the kube-aws command while instructing the user to re-run kube-aws update in case kube-aws failed at the middle of an update.

An alternative approach would be emitting an validation error when an user tried to update the whole cluster in an one-shot.

For 2., we need some domain-specific logics to determine an update is backward-incompatible or not? Otherwise, we can just give up automating it and just allow users to disable rollback via a command-line flag or a specific setting in cluster.yaml.

Regarding amiId, yea - as @redbaron noted, we can configure it per pool. Would it be a matter of documentation?

mumoshu · 2018-01-15T13:31:40Z

Btw, my take on this problem is improving kube-aws by doing all of the below:

Allowing to split the root stack into three: etcd, controllers, worker nodepools.
Emit an validation error when changes in cluster.yaml is about to result in an update to the whole cluster.
Somehow instruct the user to double-check whether the update is backward-compatible or not, and suggest to explicitly disable rollback in case it is backward-incompatible.

What are your thoughts? Thanks!

camilb · 2018-01-15T16:28:37Z

Hi @mumoshu, thanks for you fast response. In most of the upgrades I prefer not touching the ETCD nodes. Since 07/2016, when I started using kube-aws I didn't have any issue with ETCD, on any cluster and I prefer to update it less ofen than Kubernetes".
So ideally I'm thinking having 3 separate stacks with the possibility to update ETCD and Kubernetes separately.

Something like:

kube-aws update etcd for ETCD upgrade and kube-aws update to upgrade controllers and workers.
or
kube-aws update to update everything with a warning, kube-aws update etcd, kube-aws update controllers and kube-aws update workers.
In the future we can benefit from a command like kube-aws upgrade workers for people that choose to use AWS's EKS for controllers and ETCD.

Also maybe we can add a bool in cluster.yaml like RollbackOnFailure to control the "Rollback on failure" CF stack option. Have nothing against the rollback when something fails, but this option will allow the user to control when to run the rollback. Like, in my case, only the last node pool failed but already had the instances running, except one, which was not a issue.

Fsero · 2018-01-15T22:08:52Z

thanks @redbaron i didn't noticed it 🍺

mumoshu · 2018-01-16T02:34:38Z

@camilb Thanks!

Could you also mind sharing your thought on how we can trigger a roll-back when the new worker nodes are failed actually due to the preceding updates to controller nodes?

Do you just leave already-updated controller nodes as-is and re-run the updates to worker nodes (with appropriate changes in worker config so that new workers can adapt to the recent changes in controller nodes)?

camilb · 2018-01-16T09:48:57Z

Could you also mind sharing your thought on how we can trigger a roll-back when the new worker nodes are failed actually due to the preceding updates to controller nodes?

@mumoshu When Rollback on failure=No and a stack fails, a user can trigger the rollback manually from the AWS console: Actions ==> Rollback Stack.

If the workers are in a separate stack, we can rollback the workers only, as we can use controllers with greater version (I remember up to 3 releases) until we fix the issues and update the workers stack again.

mumoshu · 2018-01-16T10:46:16Z

@camilb Thanks! I believe I understood that part.

Excuse me but what I wanted to sync up was that:

What you would do when worker nodes failed to be updated due to the "successfully" updated controller nodes? Theoretically, it could happen when e.g. controller SG had mistakenly configured to reject any connection from worker nodes, and then you trigger updates to worker nodes.

In that case, as the stack for controller nodes are successfully updated, you can't trigger a manual rollback, right?

camilb · 2018-01-16T11:10:13Z

@mumoshu I understand, unfortunately the only option in that case is to fix the controllers stack and update it again. But I think it can be observed as soon as the controllers finish the update. In that case the existing workers will start to fail and if the stacks can be upgraded in two separate commands, like on GKE where you have the option to update the controllers, all the workers, or just a nodepool gcloud container clusters upgrade --master, gcloud container clusters upgrade --cluster-version and gcloud container clusters upgrade --node-pool, then maybe we can add a option for controllers CF stack to fail if existing nodes goes NodeNotReady. In that case we can trigger the rollback for controllers before starting the workers upgrade.

mumoshu · 2018-01-16T11:24:31Z

@camilb Thanks for the confirmation. I'd like to achieve a similar u/x at least.

At any circumstance, there shouldnt be a suprise such as "why my etcd/controller nodes are being updated even though my changes in cluster.yaml are solely for workers?".
Unfortunately this is the u/x we have now. Let take some action!

mumoshu · 2018-01-16T12:14:45Z

My updated suggestions for improvements:

Split the control-plane stack to two: controller and etcd
Add a flag usable like kube-aws update --only controlle,worker. If the etcd stack is being updated in this case, emit an validation error and tell the user to either include 'etcd' to the flag or revert recent modifications in cluster.yaml affecting etcd stack.
Improve kube-aws update to persist AMI IDs used across worker, etcd and controller nodes, in something like overrides.json suggested in Idea: kube-aws update --set key=value to override specific setting in cluster.yaml #1049 (comment) It should be named different - may be defaults.json?
- AMI IDs in defaults.json is updated automatically by kube-aws update. kube-aws update --only worker updates AMI ID just for worker. So `kube-aws update
Add a flag --yes(y in short). A kube-aws update without the flag prompts the user to approve changes to stacks. Example: 3 stacks(etcd, worker, controller) are being updated. It may take a long time to roll-back when this update is failed at very end of the process. Do you procece? [y/n]. Exit with the same message but withou a prompt in case a tty is missing.
Add rollbackOnFailure for controller and etcd stack, respectively. We won't need one for workers?

Use-cases:

Use kube-aws update -y to retain existing behavior. Good for GitOps.
Use kube-aws update --only controller,worker to ever touch etcd stack, possibly set controller.rollbackOnFailure: false in cluster.yaml(@camilb)
Use kube-aws update for other use.

In any case, a spare cluster for faster disaster recovery or blue-green cluster deployment(my preference) is recommended.

Down-sides:

It will be a backward-incompatible change i.e. kube-aws update on the two nested stacks cluster woud incur downtime starting the replacement of etcd, until controller stack becomes up.

mumoshu · 2018-01-16T12:18:37Z

Perhaps we need to move VPC and subnet definitions from controlplane stack to the root stack?

mumoshu · 2018-01-16T12:22:22Z

@camilb @Fsero How about the above idea? For me, it seemed to provide a smoother migration/development path while achieving your original goal?
@redbaron @c-knowles @danielfm Hi! Do you have any comments? (A backward-incompatible change again!

cknowles · 2018-01-16T12:36:07Z

Sounds reasonable to align to the UX of gcloud, we use that as well and it seems to work well.

I’m not keen on the overrides.json as it splits the config in two. I’d prefer to pin the AMI on cluster.yaml generation or in source and then add a command to update it using the existing code. i.e. less surprise updates but slightly more surprise if a user is expecting everything to auto update.

camilb · 2018-01-16T15:43:19Z

@mumoshu From my point of view your proposal it's enough to avoid CF issues in the future, in the last one and a half year didn't have any major issues with kube-aws except the CF updates and almost everytime on worker updates.

Add rollbackOnFailure for controller and etcd stack, respectively. We won't need one for workers?

I don't see any downside by adding rollbackOnFailure on all stacks as the user can trigger the rollback whenever he wants but in some situations, in case of a failure, can allow user intervention first (resizing a pool, launching separate controllers/workers, etc).

Perhaps we need to move VPC and subnet definitions from controlplane stack to the root stack?
I think it will be safer.

mumoshu · 2018-01-17T03:24:21Z

@c-knowles Thanks for your comment!

Yes, it would work too as long as we give up kube-aws itself to automatically set a latest k8s version and a latest AMI ID at runtime of kube-aws update for ease of use.

So, we could also enhance kube-aws here by:

Don't introduce something like overrides.json but instead
- Populate kubernetesVersion and etcd.version for respective default values for the specific kube-aws version in generated cluster.yaml. kube-aws would ever update it automatically.
- Emit validation errors on missing kubernetesVersion and/or amiId, etcd.version.
- Emit validation errors on possibly unsupported values specified for kubernetesVersion and/or etcd.version, while printing the default versions embeded in the kube-aws binary being run.

mumoshu · 2018-01-17T03:29:02Z

In case someone is actually relying on kube-aws update to automatically update k8s, etcd versions and AMI ID, I'd suggest writing a wrapper which perhaps run sed to replace version numbers automatically? 😄
Do you have something like that you could share with us?

mumoshu · 2018-01-17T03:31:08Z

Implementation note: It would be possible to do detect which stack(s) are being updated by creating a cfn changeset and inspecting the result, so that we can emit a validation error accordingly.

cknowles · 2018-01-17T15:19:10Z

What about something like:

First generation of cluster.yaml runs the same AMI ID gathering code as now and populates it in the yaml.
Add a new command to update cluster.yaml to latest version.
Warn if not present in cluster.yaml but still populate existing cluster.yaml with latest version.
Error if the AMI ID is too old to be supported, e.g. old docker version. Not sure this is achievable given a custom AMI can be used.

Not sure about including k8s and etcd versions as then what about container image versions etc? There's usually quite a bit more to updating than just those versions and a lot of different incompatibilities but the AMI is a little removed from that other than the docker version.

mumoshu · 2018-01-17T16:19:27Z

Hi @c-knowles!
I generally agree with the direction.

Not sure about including k8s and etcd versions as then what about container image versions etc? There's usually quite a bit more to updating than just those versions and a lot of different incompatibilities but the AMI is a little removed from that other than the docker version.

Regarding etcd and k8s versions, I wanted to include them for auto populations as they do result in node replacement once you update the kube-aws binary, even no cluster.yaml or stack templates are changed.

Add a new command to update cluster.yaml to latest version.

How would you like kube-aws to achieve it?
There's no yaml parser capable of preserving blankline and comments afaik 😢
I'm afraid that a simple text replacement would beak in various ways.

mumoshu · 2018-01-22T01:51:02Z

Updated proposal for improving kube-aws for less surprises while upgrading

Split the control-plane stack to two: controller and etcd
- Backward-compatibility: Do we need a flag to selectively keep the existing architecture(a control-plane stack w/ controller and etcd)
Add a flag usable like kube-aws update --only controlle,worker.
- If the etcd stack is being updated in this case, emit an validation error and tell the user to either include 'etcd' to the flag or revert recent modifications in cluster.yaml affecting etcd stack.
- Implementation note: Moving ETCD and controllers outside CloudFormation Nested Stacks #1112 (comment)
Improve kube-aws update to persist AMI IDs used across worker, etcd and controller nodes
- ~~In something like overrides.json suggested in Idea: kube-aws update --set key=value to override specific setting in cluster.yaml #1049 (comment) It should be named different - may be defaults.json?~~
- Populate kubernetesVerion and etcd.version in cluster.yaml at kube-aws init
- ~~AMI IDs in defaults.json is updated automatically by kube-aws update. kube-aws update --only worker updates AMI ID just for worker. So `kube-aws update~~
- Don't auto-update the versions in kube-aws update as there's no reliable way to do so. See Moving ETCD and controllers outside CloudFormation Nested Stacks #1112 (comment)
Add a flag --yes(y in short).
- A kube-aws update without the flag prompts the user to approve changes to stacks. Example: 3 stacks(etcd, worker, controller) are being updated. It may take a long time to roll-back when this update is failed at very end of the process. Do you like to proceed? [y/n]. Exit with the same message but withou a prompt in case a tty is missing.
Add rollbackOnFailure for controller and etcd stack, respectively.
- We won't need one for workers?

Use-cases:

Use kube-aws update -y to retain existing behavior. Good for GitOps.
Use kube-aws update --only controller,worker to ever touch etcd stack, possibly set controller.rollbackOnFailure: false in cluster.yaml(@camilb)
Use kube-aws update for other use.

In any case, a spare cluster for faster disaster recovery or blue-green cluster deployment(my preference) is recommended.

Down-sides:

It will be a backward-incompatible change i.e. kube-aws update on the two nested stacks cluster woud incur downtime starting the replacement of etcd, until controller stack becomes up.

whereisaaron · 2018-01-24T19:10:36Z

Interesting discussion and some good idea. The ideas that jumped out as great to me were:

Being able to disable the automatic rollback: I've been doing a lot of deployments with 0.9.9, and when something is wrong, it become a race between me and CF to diagnose the root cause before CF destroys the evidence. I'd love the option to disable it sometimes.
Inject amiid's into cluster.yaml on init: I agree with @c-knowles that the separate json file sounds clumsy and maybe mysterious to users. I'd rather explicit pinned ami's be added to cluster.yaml, even if I have to manually update those from time to time. It would be a nice assist if kubectl advisor would gather and list the latest relevant amiid and k8s versions available to help me with that. Later, if a reliable YAML patcher is available we could integrate the option to auto-update the cluster.yaml.
Separate stacks: Compared with the old node-pool system (circa 0.9.3) I love being able to have all the settings in one cluster.yaml. But I don't actually enjoy the nested CF stacks. It scares that bacon off me, when updating worker node tags or instance type or something, that I'm going anywhere near the control plane. And when deploying many times to work out the kinks with 0.9.9 configs, I wasted a lot of time watching the etcd cluster being slowly made and unnecessarily destroyed, while I worked out kinks with controller and worker deployment.

whereisaaron · 2018-01-24T19:17:47Z

@camilb I'd love to hear your experience with upgrading k8s versions in kube-aws clusters. It has never really been a topic in the kube-aws documentation and often it has been noted as 'not yet supported', 'won't include etcd', or 'might not work'.

Are you able to deploy 1.9 clusters with kube-aws 0.9.9?

Are you able to upgrade 1.8 clusters to 1.9 with kube-aws 0.9.9?

Do you just update the hypercube version and go for it?

Any tips or gotcha's you can share or add to the docs somewhere?

mumoshu · 2018-01-29T16:24:09Z

Thanks for your feedback!

mumoshu · 2018-01-29T16:32:56Z

Do we need backward-compatibility for this, e.g. a flag to split stacks or not?

Also, running kube-aws update on existing clusters after this change would recrete the whole controle-planes and kube-aws-managed vpc/subnets/etc, which indeed introduces downtime.

I'm ok with recreating every cluster cuz I consider my clusters cattles rather than pets. How about you, everyone?
Any idea if you need a better update path?

camilb · 2018-01-29T16:57:55Z

@whereisaaron I'm unsing the master branch to build kube-aws most of the times. Currently all of my clusters are running the latest changes in master. Some were upgraded from 1.8.x to 1.9.1 and the largest one from 1.7.8 to 1.9.1.
Most of the times everything works fine, but had situations where I had to regenerate all the service acoounts tokens and restart most of the pods in kube-system. Except this and CF errors, never had another issue upgrading kube-aws in almost 2 years.
We still not have a method to upgrade the kubernetes version only, so all the instances are replaced by kube-aws update. And the network is managed by a separate stack, similar to this one

camilb · 2018-01-29T16:58:37Z

@mumoshu I'm planning to migrate existing clusters to new ones, without updating.

whereisaaron · 2018-01-29T22:52:29Z

@mumoshu when you have a large collection of heterogeneous applications in a cluster, the clusters may be cattle, but the applications on them are like pet ticks on the cow you have to locate and transplant to the next beast 😄

I hopefully at least minor (x.y.z) version in-place upgrades will be possible/supported, again, trying to match the ease of upgrade GKE and the like give you. Separate stacks does make this a little easier, since node pools and controllers are pretty much cattle within a cluster.

mumoshu · 2018-04-20T01:52:30Z

I'll be merging #1233 and cut v0.9.11-rc.1, once v0.9.10 is released. Any comments, opinions, etc? 😃

kevtaylor · 2018-04-20T08:11:09Z

@mumoshu Just one comment so far - is it possible to split the worker stacks with this implementation. ie. If I have a multiple nodepool environment, I'd like to roll the nodepool stacks independently

mumoshu · 2018-04-20T08:25:28Z

@kevtaylor It won't be so hard to implement a naive support for that like kube-aws update --tarrgets nodepool1 rolls the single nodepool only, where you have a node pool with name: nodepool1 in cluster.yaml. Would it be ok for you?

Anyway, may I ask you about your exact use-case for that?

I guess that if you had modified the first node pool only in your cluster.yaml, kube-aws update --targets worker would ever affect second and the following pools.

Oh, maybe does it relate with amiId, which is propagated from top-level to every node pool?

mumoshu · 2018-04-20T08:25:43Z

Thank you so much for the comment, anyway!

kevtaylor · 2018-04-20T08:42:06Z

So the types of things we do are split node pools into separate groups
Group A may have c4 instance and be our standard pool - called NODEPOOLA, NODEPOOLB etc.
Group B may have d instance types and tainting - called NODEPOOL-BIGA, NODEPOOL-BIGB

We then add tolerations to certain pod types which then favour that nodepool

We would still use a consistent AMI across the nodepools

But we may want to say, change the instance type of Group A, and then just roll that pool - or if we wanted to change an AMI id - just test that on a given stack first etc.

mumoshu · 2018-04-20T12:29:26Z

@kevtaylor Thanks for the explanation! It is now clearer to me.

How about kube-aws update --targets nodepool-biga,nodepool-bigb?
I have already implemented the ability to select multiple targets at a time so --targets etcd,controller is possible. I can extend the current impl to able to also target multiple but not all node pools.

Would it sound good?

kevtaylor · 2018-04-23T06:42:26Z

@mumoshu That looks spot on to me

mumoshu · 2018-04-24T02:19:32Z

@kevtaylor Thanks for your confirmation 👍 Just implemented it into #1233

#1233) * Prompt before updating * feat: Support updating a subset of cfn stacks only * Vendor changes for the stack subset update feature * Rename some func-scoped variables for clarity * feat: Extract etcd and network stack for more fine-grained cluster update * feat: Ability to specify just one or more node pools for updates * fix: Make update work with `--targets all` Ref #1112

davidmccormick · 2018-05-24T16:41:36Z

Hi, I just rolled the lasted master into an existing 0.9.10 cluster and it did something strange with the etcd's - it created 3 new ones and left the old ones. I think we are going to have to think about how to make this migrate from old to new separated stacks.

cknowles · 2018-05-25T15:50:11Z

I guess as it's a separate stack now nothing is managing that migration. Are we still advising to do a blue/green cluster switchover? We should also update the CLI reference in the docs.

davidmccormick · 2018-06-04T10:12:47Z

I don't think we should be allowing this to roll into an existing cluster if the result is going to be to effectively remove all the existing state. I think that we need a migration path or a hard stop that prevents users from accidentally wiping their existing clusters.

mumoshu · 2018-06-04T12:09:51Z

I understand your concern. How about preventing `kube-aws update` when theres no network stack? That's the quickest guard that I can think of. 2018年6月4日(月) 19:12 Dave McCormick <[email protected]>:

…

I don't think we should be allowing this to roll into an existing cluster if the result is going to be to effectively remove all the existing state. I think that we need a migration path or a hard stop that prevents users from accidentally wiping their existing clusters. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1112 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABV-e5sHPKLhEQVisfmQ8KnTN8i1rbGks5t5QgggaJpZM4ReSQn> .

davidmccormick · 2018-06-04T12:49:18Z

That sounds like a very good quick safety change and could give us some time to put together a more comprehensive solution! :)

Regarding a migration strategy, I'm not very familiar with Cloud Formation, but have noticed three interesting looking tags, e.g.

aws:cloudformation:logical-id : Etcd0EBS
aws:cloudformation:stack-id : arn:aws:cloudformation:us-west-2:034324643013:stack/davem-lab-secure-Etcd-URJB6X4OXGUT/8c400ba0-5f75-11e8-902e-503acbd4dc29
aws:cloudformation:stack-name : davem-lab-secure-Etcd-URJB6X4OXGUT

Does it sound plausible to write a function in kube-aws that looks for legacy etcd resources, instances, volumes (etc.) and re-tag them with the with the stack-id of the new stack? Will this make cloud-formation treat these as members of the new stack? I have no idea if this would work but if it does it might give us a clean migration path.

Anyone with more CF experience can give me a steer whether this solution is worth-while spending some time to try out and test it?

davidmccormick · 2018-06-05T10:22:39Z

Thinking some more about it, I don't think that my suggestion above will work - given that the we are creating a new etcd stack rather than updating it perhaps we could expect an error about clashing with existing resources. There is clearly some complexity here regarding how CloudFormation works and my causual investigation so far hasn't thrown much up in the way of people migrating resources across different stacks.

davidmccormick · 2018-06-07T13:42:42Z

The first problem that I find when trying to update a 0.9.9 cluster to the new Etcd stack is that the new ETCD0 fails to send it's cfn-signal for some reason. I could never fathom how we get over this in a new clean cluster so I have created a PR that will bring the ETCD's up in parrallel upon new creation of the stack and thus avoid this problem #1357

mumoshu · 2018-09-28T07:58:49Z

Oh, did we miss implementing rollbackOnFailure: false that has been proposed in the middle of this thread?

davidmccormick · 2018-09-28T08:42:27Z

Hiyah, I didn't disable rollback as part of the etcd migration code because generally if the etcd migration failed then rolling back is a good thing. If the controllers fail to come up then this should also trigger a roll back before the old etcds are deleted. Is this in relation to a desired feature or an issue upgradings?

mumoshu · 2018-09-28T08:51:59Z

@davidmccormick As far as I remember, rollbackOnFailure: false was mostly wanted for disabling rollback for worker nodes only. It is useful when there are a large number of worker nodes. Recreating all those just due to e.g. temporary EC2/ASG issue/instability isn't desirable.

fejta-bot · 2019-04-25T20:13:20Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-25T20:55:43Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-24T21:46:20Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-24T21:46:28Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

whereisaaron mentioned this issue Jan 29, 2018

What are the viable upgrade paths for a kube-aws 0.9.9 k8s 1.8 cluster to k8s 1.9 #1120

Closed

mumoshu mentioned this issue May 5, 2018

Conform every release including final and rc to semantic versioning #1279

Closed

This was referenced Jul 3, 2018

0.11.x migration from existing clusters without losing state #1380

Merged

0.10.x migration preparation release #1379

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2019

k8s-ci-robot closed this as completed Jun 24, 2019

Moving ETCD and controllers outside CloudFormation Nested Stacks #1112

Moving ETCD and controllers outside CloudFormation Nested Stacks #1112

Comments

camilb commented Jan 15, 2018

Fsero commented Jan 15, 2018

redbaron commented Jan 15, 2018

mumoshu commented Jan 15, 2018 • edited Loading

mumoshu commented Jan 15, 2018 • edited Loading

camilb commented Jan 15, 2018

Fsero commented Jan 15, 2018

mumoshu commented Jan 16, 2018

camilb commented Jan 16, 2018

mumoshu commented Jan 16, 2018

camilb commented Jan 16, 2018 • edited Loading

mumoshu commented Jan 16, 2018

mumoshu commented Jan 16, 2018

mumoshu commented Jan 16, 2018

mumoshu commented Jan 16, 2018

cknowles commented Jan 16, 2018

camilb commented Jan 16, 2018

mumoshu commented Jan 17, 2018 • edited Loading

mumoshu commented Jan 17, 2018

mumoshu commented Jan 17, 2018

cknowles commented Jan 17, 2018

mumoshu commented Jan 17, 2018 • edited Loading

mumoshu commented Jan 22, 2018

Updated proposal for improving kube-aws for less surprises while upgrading

whereisaaron commented Jan 24, 2018

whereisaaron commented Jan 24, 2018

mumoshu commented Jan 29, 2018

mumoshu commented Jan 29, 2018

camilb commented Jan 29, 2018 • edited Loading

camilb commented Jan 29, 2018

whereisaaron commented Jan 29, 2018

mumoshu commented Apr 20, 2018

kevtaylor commented Apr 20, 2018

mumoshu commented Apr 20, 2018

mumoshu commented Apr 20, 2018

kevtaylor commented Apr 20, 2018

mumoshu commented Apr 20, 2018

kevtaylor commented Apr 23, 2018

mumoshu commented Apr 24, 2018

davidmccormick commented May 24, 2018

cknowles commented May 25, 2018

davidmccormick commented Jun 4, 2018

mumoshu commented Jun 4, 2018 via email

davidmccormick commented Jun 4, 2018 • edited Loading

davidmccormick commented Jun 5, 2018 • edited Loading

davidmccormick commented Jun 7, 2018

mumoshu commented Sep 28, 2018

davidmccormick commented Sep 28, 2018

mumoshu commented Sep 28, 2018

fejta-bot commented Apr 25, 2019

fejta-bot commented May 25, 2019

fejta-bot commented Jun 24, 2019

k8s-ci-robot commented Jun 24, 2019

mumoshu commented Jan 15, 2018 •

edited

Loading

mumoshu commented Jan 15, 2018 •

edited

Loading

camilb commented Jan 16, 2018 •

edited

Loading

mumoshu commented Jan 17, 2018 •

edited

Loading

mumoshu commented Jan 17, 2018 •

edited

Loading

camilb commented Jan 29, 2018 •

edited

Loading

davidmccormick commented Jun 4, 2018 •

edited

Loading

davidmccormick commented Jun 5, 2018 •

edited

Loading