Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cherry-pick 7065, 7087 into datadog-master-13.0 (handle failed nodegroups better) #121

Merged

Conversation

domenicbozzuto
Copy link

Cherry-picks the following changes:

The former is by the same author and is a precursor for 7087; it makes the cherry-pick cleaner (only one small conflict in the unit tests for static_autoscaler_test vs several in the main body of static_autoscaler.

This should resolve an issue where the CA stops upscaling for the whole cluster when a single VMSS is stuck in a failed state.

In order to simplify the deleteNodesWithErrors code, return nodeGroupID
as well as nodes with create errors. That way we avoid the additional
node group matching code.
Clean up cluster state after removing failed scale up nodes,
so that the loop can continue. Most importantly, update the
target for the affected node group, so that the deleted nodes
are not considered upcoming.
@domenicbozzuto domenicbozzuto changed the title Cherry pick cherry-pick 7065, 7087 into datadog-master-13.0 (handle failed nodegroups better) Sep 23, 2024
@domenicbozzuto domenicbozzuto merged commit 76deaaa into datadog-master-13.0 Sep 23, 2024
6 checks passed
@domenicbozzuto domenicbozzuto deleted the dom.bozzuto/cherry-pick-provision-changes branch September 23, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants