-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VMs pool] Add implementation for VMs pool #6951
base: master
Are you sure you want to change the base?
Conversation
Hi @wenxuan0923. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good, but I'm afraid I don't have a permission to approve (I think).
Also, I don't have much expertise in interactions with AKS API on AgentPool. Would rely on you to test.
return nil, err | ||
|
||
// Use Service Principal | ||
if len(cfg.AADClientID) > 0 && len(cfg.AADClientSecret) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this support UseWorkloadIdentityExtension
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure...but it should be fairly easy to add it if needed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be used for managed, but it is the preferred method of authentication for self-hosted at the moment. I would recommend you add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding to what robin said, the workload identity extension gets addressed in this function below: https://github.com/kubernetes/autoscaler/pull/6951/files#diff-e18278caa19fbc44db54dcd3c6d8e77d045fff33e3a7cd0b6a65f5a111543b83R288-R354
so I'd look into reusing this function. we use workload identity for local testing + dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to support UseWorkloadIdentityExtension
manager *AzureManager | ||
resourceGroup string | ||
manager *AzureManager | ||
resourceGroup string // MC_ resource group for nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer resourceGroup
to be nodeResourceGroup
. But the current convention is being used in other places as well. I think I will save this idea for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we want to keep this consistent with other places.
} | ||
|
||
// scaleUpToCount sets node count for vms agent pool to target value through PUT AP call. | ||
func (agentPool *VMsPool) scaleUpToCount(count int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer always-one-use methods like this to be a part of the code calling it, rather than being a separate method, especially when there are no clear boundary of responsibility. I think the case of IncreaseSize
and scaleUpToCount
(and more) applies.
In case you have a similar opinion, I don't think you need to be restricted by the pattern in azure_scale_set.go
for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference here, just thought it might make it a bit easier for maintainers if it has the same pattern with vmss. I would like to collect more feedback of this PR and make changes together if needed.
fyi, there is no support for this in core which is almost definitely required for this nodegroup implementation, too. here's the issue + work exploring this: #6202 |
for i, providerID := range providerIDs { | ||
// extract the machine name from the providerID by splitting the providerID by '/' and get the last element | ||
// The providerID look like this: | ||
// "azure:///subscriptions/feb5b150-60fe-4441-be73-8c02a524f55a/resourceGroups/mc_wxrg_play-vms_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodes-32301838-vms0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to put real subs here since its open source? Given they can't do anything with the subscription, still probably good to not expose it.
// "azure:///subscriptions/feb5b150-60fe-4441-be73-8c02a524f55a/resourceGroups/mc_wxrg_play-vms_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodes-32301838-vms0" | |
// "azure:///subscriptions/0000000-0000-0000-0000-00000000000/resourceGroups/mc_wxrg_play-vms_eastus/providers/Microsoft.Compute/virtualMachines/aks-nodes-32301838-vms0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the sub as suggested.
providerIDParts := strings.Split(providerID, "/") | ||
machineNames[i] = &providerIDParts[len(providerIDParts)-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple places we get these provider parts. Im wondering if a structure similar to https://github.com/Azure/karpenter-provider-azure/blob/main/pkg/utils/subnet_parser.go#L25 might make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I was thinking if we can have a consistent Validate() function attached to that ProviderIDParser, that validates the provider id shapes. Do we validate the provider id shape ahead of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We validate node.Spec.ProviderID
when we call NodeGroupForNode
here:
https://github.com/kubernetes/autoscaler/blob/3914bdebd1e9453396408f1f99efa4cc5eeaf078/cluster-autoscaler/cloudprovider/azure/azure_cloud_provider.go#L105C1-L114C1
If the Node doesn't have a valid ProviderID, CAS won't be able scale down the corresponding node pool if I understand correctly, so when we reach the code here, we should be able to assume the ID is valid.
Also I just noticed that there's an existing utility method for name parsing here, updated the code to use that instead:
func resourceName(ID string) (string, error) { |
} | ||
|
||
// DeleteNodes extracts the providerIDs from the node spec and | ||
// delete or deallocate the nodes from the agent pool based on the scale down policy. | ||
// delete or deallocate the nodes based on the scale down policy of agentpool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a deallocate implementation in the upstream codebase here yet right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have it here in CAS upstream, but AKS-RP is able to deallocate/delete nodes based on property properties.scaleDownMode
of agentpool. If an VMs pool has scaleDownMode == Deallocate
, the scale down triggered by CAS will cause RP to deallocate nodes rather than delete.
@@ -102,28 +120,260 @@ func (agentPool *VMsPool) MaxSize() int { | |||
// TargetSize returns the current TARGET size of the node group. It is possible that the | |||
// number is different from the number of nodes registered in Kubernetes. | |||
func (agentPool *VMsPool) TargetSize() (int, error) { | |||
// TODO(wenxuan): Implement this method | |||
return -1, cloudprovider.ErrNotImplemented | |||
size, err := agentPool.getVMsPoolSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we just implement TargetSize directly rather than deferring to getVMsPoolSize
? Can't we just use TargetSize in place of this method wherever we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually followed exact same pattern with VMSS here - thought that will make it easier for maintainers:
size, err := scaleSet.GetScaleSetSize() |
|
||
if _, err = poller.PollUntilDone(updateCtx, nil); err == nil { | ||
// success path | ||
klog.Infof("agentPoolClient.BeginCreateOrUpdate for aks cluster %s agentpool %s succeeded", agentPool.clusterName, agentPool.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we always want to log on success? Won't we in most cases log the results of this function ie the error? So we can infer when its successful that way? Do we log in other nodepool implementations?
We don't do a good job of this now, but really we should take better care across the codebase with verbosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log removed
return err | ||
} | ||
|
||
klog.V(1).Infof("Scaling up vms pool %s to new target count: %d", agentPool.Name, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target size is recorded outside of this iirc, did you find that the existing logging was not enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log removed
|
||
// if the target size is smaller than the min size, return an error | ||
if int(currentSize) <= agentPool.MinSize() { | ||
klog.V(3).Infof("min size %d reached, nodes will not be deleted", agentPool.MinSize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most places in core, we log the error from DeleteNodes()
err = nodeGroup.DeleteNodes(nodesToDelete) |
Should we be logging this twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log removed
6c5a7c3
to
c15fa4c
Compare
// TemplateNodeInfo is not implemented. | ||
func (agentPool *VMsPool) TemplateNodeInfo() (*schedulerframework.NodeInfo, error) { | ||
// TODO(wenxuan): implement this method when vms pool can fully support GPU nodepool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having this I think means scaling from zero is not supported (even on non-GPU nodepools)
agentPool.manager.invalidateCache() | ||
_, err := agentPool.getVMsPoolSize() | ||
if err != nil { | ||
klog.Warningf("DecreaseTargetSize: failed with error: %v", err) | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally NOOP implementation deserves a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I'm supposed to return err here, updated the code. Thanks for pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but my point is a little different.
This implementation of DecreaseTargetSize
(admittedly, same as current implementation for VMSS) does not actually do anything (except invalidating the cache), and yet (essentially) never fails. This is a latent problem, as it is used by autoscaler core to adjust the size of nodepool that appears to need adjusting, and not returning an error makes core think adjustment was made, which can in turn lead to persistent issues (that have been observed). It is not a problem if DecreaseTargetSize
never gets called - which is actually the most common case for well-behaving nodepool. But a correct implementation would likely a) actually adjust the target size (subject to constraints in comments for DecreaseTargetSize
) and b) return an error if it is not possible or not successful. Not doing this is a choice (or a bug?) - that's what I meant by "NOOP implementation deserves a comment".
VMSS version likely needs revisiting as well. It does have an explanation for not doing anything in comment ("VMSS size should be changed automatically after the Node deletion, hence this operation is not required."), which I don't fully understand/agree with. Feels like it needs to either implement adjustment properly, or return an error. And just returning ErrNotImplemented
is an option, though I have not tried it.
So please give some thought to whether a better implementation of DecreaseTargetSize
for VM node pool (respecting constraints outlined in comments) is possible. For expediency, it is probably Ok to leave it as is for now (maybe at least with a comment like "just following VMSS implementation"), to revisit later. If you do, would be good to validate in testing that it actually never gets called. (Or maybe just change to returning ErrNotImplemented
?)
/lgtm |
@@ -148,7 +148,7 @@ func (az *azDeploymentsClient) Delete(ctx context.Context, resourceGroupName, de | |||
return future.Response(), err | |||
} | |||
|
|||
//go:generate sh -c "mockgen k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure AgentPoolsClient >./agentpool_client.go" | |||
//go:generate sh -c "mockgen -source=azure_client.go -destination azure_mock_agentpool_client.go -package azure -exclude_interfaces DeploymentsClient -copyright_file copyright.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rakechill I'm fixing the mockgen command in this PR. Don't know why the mockgen reflect mode is not working, change it to use source mode instead.
355df75
to
1d15fd6
Compare
1d15fd6
to
eefca96
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Add implementation to support VMs pool autoscaling. Right now, only the basic scenario is covered: it does not include GPU pool. They will be handled in the later PRs.
To test it with VMs pool:
Create a cluster in prod:
az aks create -n play-vms -g wenxrg --vm-set-type "VirtualMachines" -l eastus
add deploy CAS with command argument:
--nodes=1:6:nodepool1
and required environment variables (note cluster name and cluster resource group name is required for VMs pool).Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: