-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Check MinSize in fix nodegroup size #6650
base: master
Are you sure you want to change the base?
Fix: Check MinSize in fix nodegroup size #6650
Conversation
Hi @daimaxiaxie. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
This PR fixes problem for ASGs, but at the same time breaks it for other cloudproviders. We generally don't assume min size is enforced at provider side (i.e. the expectation is that if CA decides to delete a node it can do so) and the current behavior of going below is the desired one. The reason is that there is no point keeping a VM that cannot register; it's not really providing capacity to kubernetes cluster. Min size is there to guarantee a minimum capacity of the cluster, but a node that cannot register shouldn't really be counted. We need a more nuanced solution to the problem, one that allows CA to delete unregistered nodes below min size, while giving a chance to providers to handle it. IIRC we discussed extending cloudprovider to support this with @Bryce-Soghigian - do I remember correctly that you have a fix for this that you wanted to upstream? |
Actually a good question is why do we have such a check in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/static_autoscaler.go#L785. That feels really inconsistent, why are we ok going below min in deleteCreatedNodesWithErrors, but not in removeOldUnregisteredNodes? Sure, in the case of deleteCreatedNodesWithErrors we're probably deleting some placeholders and not actual VMs, but what exactly those placeholders are would differ by provider and at the end of the day we're changing target size to be less than min size. If we say that's ok - and I think we should, as per my comment above the meaningful interpretation of min size is "minimum number of nodes" and a VM that cannot register is not really a node - we should do it in all paths where VMs fail to register as nodes. Thoughts @x13n @towca @BigDarkClown ? |
Yep @MaciekPytel #6658 The idea is that when the number of unregistered nodes is below min count, we won't ever remove them. This Draft PR will add a respectMinCount to the delete nodes interface that will allow us to make min count violating deletions for unregistered nodes. I need to sit down and finish it. I wanted to upstream the change so that all cloud providers could benefit from it as well but that involved touching a lot more code thus taking longer than I thought it would :) |
Another thing to noted is not all cloudproviders enforce min count on DeleteNodes() level as well. Some of them rely on these higher level checks.(They shouldn't?)
For things like IsClusterHealthy, or IsNodegroupHealthy, the opposite is true. We only represent real nodes and not these partial vms (vms that never join apiserver) in the health signal. IMO if the bootstrap is failing its more indicative of a health problem than not ready nodes. |
I very much agree with this view. Thank you all @MaciekPytel @Bryce-Soghigian . |
I also have this question, this code comes from @kawych . I think |
@daimaxiaxie I think the logic was the same even before @kawych changes FWIW. @MaciekPytel I have no idea why the behavior is inconsistent between the 2 places, I agree that CA should be able to delete the VMs in both scenarios even if it means going below the MinSize. @Bryce-Soghigian brought up a good point - if we expect the limit enforcement to happen outside fo CP, it should also mean that CPs shouldn't enforce the limits within |
One could also argue that having that cloud provider enforce min count in case some upper level logic is helpful. It means that no matter what we can't violate min count and scale beyond max count. That being said there are cases(like unregistered nodes) where we want to be able to force a deletion. |
I agree we should be able to go below min in case of long unregistered nodes and creation errors. There's no point in keeping such nodes around. However, I'm not convinced complicating CA logic to pass the "ignore min size" flag down to cloud providers is the right thing to do. When CA is managing a node group, it should be the source of truth for determining group size. When CA tries to delete a node, cloud provider code should actuate this decision (or return an error). If we're allowing force delete in certain cases, the additional safeguard on cloud provider side isn't really going to provide a lot of additional protection I think? What is the use case for sometimes honoring min size internally in cloud provider code? Is it meant as extra layer of protection in case CA goes amok or is there more to it? |
Yeah exactly. It serves as a safeguard if someone adds a new deletion call. Really, if we didn't have that cloudprovider code I could just delete the unregistered nodes outright. But with the cloudprovider safeguards, i cannot just delete the unregistered nodes. Since min count is a fundamental principle and people expect it to never be violated it makes sense to have that extra safety in the cloudprovider. But also I would like to be able to remove nodes that are bad. Either modifying that interface for DeleteNodes() or have a new method called UnsafeDeleteNodes() might make sense. |
That makes sense, thanks. Btw, |
Force makes a lot of sense lets do that instead. |
Thanks, so I think we can close this one then. /close |
@x13n: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @x13n , I updated the code and now only respects MinSize in Respecting MinSize is necessary in |
/reopen |
@daimaxiaxie: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: daimaxiaxie The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -728,6 +728,17 @@ func fixNodeGroupSize(context *context.AutoscalingContext, clusterStateRegistry | |||
incorrectSize.ExpectedSize, | |||
incorrectSize.CurrentSize, | |||
delta) | |||
|
|||
possibleDelta := nodeGroup.MinSize() - incorrectSize.ExpectedSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want this check? MinSize is there to protect Cluster Autoscaler against scaling down below MinSize. If we want to fix the target size to sth below minSize, it means that the nodegroup started below minSize, we tried to scale up, but nodes are not registering correctly. I would expect that adjusting the target size of the group even to a value below minSize is something that we want to be able to do, we're not violating the minSize constraint, just reacting to a failed scale up situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the user's perspective, minsize is the minimum value of nodegroup target, and it cannot be less than minsize at any time.
On the other hand, many cloud vendors do not allow you to set a value lower than minsize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this makes sense in CA core. For cloud providers that prevent target size setting below min, adjusting the size can happen on cloud provider side. Otherwise it is going to prevent cloud providers that do allow going below min from working correctly. Note that incorrect size in CSR may stay incorrect if target size is adjusted up to match min size, so the subsequent loops will run this code again.
I also encountered this bug |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When an ASG out of stock and the instance is less than MinSize(especially spot), the entire Cluster Autoscaler will be stuck due to
deleteCreatedNodesWithErrors and fixNodeGroupSize
return error.Cluster Autoscaler cannot handle this situation well.
Which issue(s) this PR fixes:
Fixes #6601
Special notes for your reviewer:
Respect MinSize.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: