Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Support async create VM #781

Merged
merged 1 commit into from
Nov 15, 2024
Merged

Conversation

akutz
Copy link
Collaborator

@akutz akutz commented Oct 28, 2024

What does this PR do, and why is it needed?

This patch introduces an async variant on the provider's CreateOrUpdateVirtualMachine function. This new function, CreateOrUpdateVirtualMachineAsync, returns <-chan error, error, for when a non-blocking create is used.

Please note, support for async create is only enabled when the async signal feature is enabled. This is because the new async create workflow no longer:

  • Falls into a post-create reconfigure, instead allowing async signal to enqueue a new reconcile when the create has completed.
  • Requeues the VM after N time based on the VM's state, ex. when the VM is powered on and does not yet have an IP address. This also now relies on async signal when the feature is enabled.

While a non-blocking create does mean the reconciler threads are no longer consumed by create operations, it does not mean VM Op will allow unbounded, concurrent creates. Because each non-blocking create operation consumes a goroutine, the number of concurrent create operations is still limited. The limit is the same as the number of threads previously allowed to do create operations. The difference is, with async create disabled, if 16 threads are doing creates, that is 16 threads that cannot do anything else. With async create enabled, if 16 goroutines are doing create, there are still 16 reconciler threads available to do other things.

This features requires the WorkloadDomainIsolation feature to be enabled and for the environment variable ASYNC_SIGNAL_DISABLED to be unset or set to a non-truth-y value. Even if those conditions are met, the environment variable ASYNC_CREATE_DISABLED may be set to a truth-y value in order to fall back to the previous behavior.

Which issue(s) is/are addressed by this PR? (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes NA

Are there any special notes for your reviewer:

This is WiP while I:

  • Fix any lingering unit/integration test issues

  • Replace mention of "create threads" with "create goroutines"

  • Validation this on a real testbed

    $ kubectl -n vmware-system-vmop get pods --no-headers | grep manager | awk '{print $1}' | xargs kubectl -n vmware-system-vmop logs -cmanager | grep 'blocking create'
    I1030 12:53:20.335746       1 vmprovider_vm.go:191] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/cyphr-1"
    I1030 12:53:20.441368       1 vmprovider_vm.go:191] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/clear-1"
    I1030 12:53:20.594433       1 vmprovider_vm.go:191] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/cyphr-1"
    I1030 12:53:20.664159       1 vmprovider_vm.go:191] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/clear-1"
    I1030 12:53:20.708884       1 vmprovider_vm.go:191] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/cyphr-1"
    I1030 12:53:20.789895       1 vmprovider_vm.go:191] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/clear-1"

    image

    image

    Per the log, this obviously has the issue of duplicate creates, although it must be stated the only result of this is an error in the task collector that the item already exists. Anyway, the next action item will remove the duplicates.

  • Prevent overlapping creates for the same VM

    There is now a single async create op per VM:

    $ kubectl -n vmware-system-vmop get pods --no-headers | grep manager | awk '{print $1}' | xargs kubectl -n vmware-system-vmop logs -cmanager | grep 'blocking create'
    I1030 14:17:28.160553       1 vmprovider_vm.go:208] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/cyphr-1"
    I1030 14:17:28.174597       1 vmprovider_vm.go:208] "Doing a non-blocking create" logger="vsphere" vmName="my-namespace-1/clear-1"

Please add a release note if necessary:

Support asynchronous creation of VMs with non-blocking call to Content Library DeployOVF.

@github-actions github-actions bot added the size/XL Denotes a PR that changes 500-999 lines. label Oct 28, 2024
@akutz akutz force-pushed the feature/async-create-vm branch 3 times, most recently from 67dbe35 to c8e1590 Compare October 29, 2024 14:17
@github-actions github-actions bot added size/XXL Denotes a PR that changes 1000+ lines. and removed size/XL Denotes a PR that changes 500-999 lines. labels Oct 29, 2024
@akutz akutz changed the title WiP: ✨ Support async create VM ✨ Support async create VM Oct 29, 2024
@akutz akutz force-pushed the feature/async-create-vm branch 7 times, most recently from 10cab32 to 4cf4f6d Compare October 29, 2024 16:12
@akutz akutz requested a review from bryanv October 29, 2024 20:30
@akutz akutz force-pushed the feature/async-create-vm branch 6 times, most recently from 8ac0447 to 52505c0 Compare October 30, 2024 15:18
@akutz akutz requested a review from srm09 November 13, 2024 19:29
pkg/config/config.go Outdated Show resolved Hide resolved
pkg/providers/vsphere/vmprovider_vm.go Show resolved Hide resolved
Copy link
Member

@dilyar85 dilyar85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the typos in config comments.

pkg/config/config.go Outdated Show resolved Hide resolved
pkg/config/config.go Outdated Show resolved Hide resolved
This patch introduces an async variant on the provider's
`CreateOrUpdateVirtualMachine` function. This new function,
`CreateOrUpdateVirtualMachineAsync`, returns `<-chan error, error`,
for when a non-blocking create is used.

Please note, support for async create is only enabled when the
async signal feature is enabled. This is because the new async
create workflow no longer:

- Falls into a post-create reconfigure, instead allowing async
  signal to enqueue a new reconcile when the create has completed.

- Requeues the VM after N time based on the VM's state, ex. when
  the VM is powered on and does not yet have an IP address. This
  also now relies on async signal when the feature is enabled.

While a non-blocking create does mean the reconciler
threads are no longer consumed by create operations, it does not
mean VM Op will allow unbounded, concurrent creates. Because each
non-blocking create operation consumes a goroutine, the number of
concurrent create operations is still limited. The limit is the same
as the number of threads previously allowed to do create operations.
The difference is, with async create disabled, if 16 threads are
doing creates, that is 16 threads that cannot do anything else. With
async create enabled, if 16 goroutines are doing create, there are
still 16 reconciler threads available to do other things.

This features requires the WorkloadDomainIsolation feature to be
enabled and for the environment variable ASYNC_SIGNAL_DISABLED to
be unset or set to a non-truth-y value. Even if those conditions are
met, the environment variable ASYNC_CREATE_DISABLED may be set to a
truth-y value in order to fall back to the previous behavior.
@akutz akutz force-pushed the feature/async-create-vm branch from 52505c0 to 8f51f41 Compare November 15, 2024 21:22
Copy link

Code Coverage

Package Line Rate Health
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/clustercontentlibraryitem 82%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/contentlibraryitem 85%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/utils 97%
github.com/vmware-tanzu/vm-operator/controllers/infra/capability/configmap 86%
github.com/vmware-tanzu/vm-operator/controllers/infra/capability/crd 93%
github.com/vmware-tanzu/vm-operator/controllers/infra/configmap 71%
github.com/vmware-tanzu/vm-operator/controllers/infra/node 77%
github.com/vmware-tanzu/vm-operator/controllers/infra/secret 77%
github.com/vmware-tanzu/vm-operator/controllers/infra/validatingwebhookconfiguration 85%
github.com/vmware-tanzu/vm-operator/controllers/infra/zone 76%
github.com/vmware-tanzu/vm-operator/controllers/storageclass 95%
github.com/vmware-tanzu/vm-operator/controllers/storagepolicyquota 97%
github.com/vmware-tanzu/vm-operator/controllers/util/encoding 73%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/storagepolicyusage 99%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/virtualmachine 87%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/volume 87%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineclass 75%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinepublishrequest 81%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinereplicaset 67%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice 83%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice/providers 92%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinesetresourcepolicy 80%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1 72%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1/conditions 88%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1/patch 78%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha2 72%
github.com/vmware-tanzu/vm-operator/pkg/bitmask 100%
github.com/vmware-tanzu/vm-operator/pkg/builder 95%
github.com/vmware-tanzu/vm-operator/pkg/conditions 88%
github.com/vmware-tanzu/vm-operator/pkg/config 100%
github.com/vmware-tanzu/vm-operator/pkg/config/capabilities 100%
github.com/vmware-tanzu/vm-operator/pkg/config/env 100%
github.com/vmware-tanzu/vm-operator/pkg/context/generic 100%
github.com/vmware-tanzu/vm-operator/pkg/context/operation 100%
github.com/vmware-tanzu/vm-operator/pkg/patch 78%
github.com/vmware-tanzu/vm-operator/pkg/prober 91%
github.com/vmware-tanzu/vm-operator/pkg/prober/probe 90%
github.com/vmware-tanzu/vm-operator/pkg/prober/worker 77%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere 75%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/client 80%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/clustermodules 71%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/config 89%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/contentlibrary 74%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/credentials 100%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/network 80%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/placement 79%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/session 71%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/storage 44%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/sysprep 100%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/vcenter 82%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/virtualmachine 84%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/vmlifecycle 68%
github.com/vmware-tanzu/vm-operator/pkg/record 87%
github.com/vmware-tanzu/vm-operator/pkg/topology 91%
github.com/vmware-tanzu/vm-operator/pkg/util 87%
github.com/vmware-tanzu/vm-operator/pkg/util/annotations 100%
github.com/vmware-tanzu/vm-operator/pkg/util/cloudinit 89%
github.com/vmware-tanzu/vm-operator/pkg/util/cloudinit/validate 91%
github.com/vmware-tanzu/vm-operator/pkg/util/image 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube 89%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/cource 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/internal 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/proxyaddr 75%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/spq 100%
github.com/vmware-tanzu/vm-operator/pkg/util/paused 100%
github.com/vmware-tanzu/vm-operator/pkg/util/ptr 100%
github.com/vmware-tanzu/vm-operator/pkg/util/resize 97%
github.com/vmware-tanzu/vm-operator/pkg/util/vmopv1 92%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/client 64%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/vm 79%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/watcher 86%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig 95%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/crypto 98%
github.com/vmware-tanzu/vm-operator/pkg/webconsolevalidation 100%
github.com/vmware-tanzu/vm-operator/services/vm-watcher 91%
github.com/vmware-tanzu/vm-operator/webhooks/common 100%
github.com/vmware-tanzu/vm-operator/webhooks/persistentvolumeclaim/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/unifiedstoragequota/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/mutation 87%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/mutation 62%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinepublishrequest/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinereplicaset/validation 90%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/mutation 67%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesetresourcepolicy/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinewebconsolerequest/v1alpha1/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinewebconsolerequest/v1alpha2/validation 92%
Summary 83% (10623 / 12762)

Minimum allowed line rate is 79%

@akutz akutz merged commit fa8791a into vmware-tanzu:main Nov 15, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-not-required size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants