-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Fast Deploy (Experimental) #823
Open
akutz
wants to merge
1
commit into
vmware-tanzu:main
Choose a base branch
from
akutz:feature/fast-deploy-internal-poc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
akutz
force-pushed
the
feature/fast-deploy-internal-poc
branch
9 times, most recently
from
December 16, 2024 18:28
75d6b44
to
db5ac02
Compare
bryanv
approved these changes
Dec 19, 2024
This patch adds support for the Fast Deploy feature, i.e. the ability to quickly provision a VM as a linked clone, as an experimental feature that must be enabled manually. There are many things about this feature that may change prior to it being ready for production. The patch notes below are broken down into several sections: * Goals -- What is currently supported * Non-goals -- What is not on the table right now * Architecture * Activation -- How to enable this experimental feature * Placement -- Request datastore recommendations * Disk cache -- Per-datastore cache for Content Library item disk(s) * Create VM -- Create linked clone directly from cached disk -~= Goals =~- The following goals are what is considered in-scope for this experimental feature at this time. Just because something is not listed, it does not mean it will not be added before the feature is made generally available: * Support all VM images that are OVFs * Support multiple zones * Support workload-domain isolation * Support all datastore types, including host-local and vSAN -~= Non-goals =~- The following is a list of non-goals that are not in scope at this time, although most of them should be revisited prior to this feature graduating to production: * Support VM encryption Child disks can only be encrypted if their parent disks are encrypted. Users *could* deploy an encrypted VM without using Fast Deploy, and then publish that VM as an image to then be used as the source for provisioning encrypted VMs using Fast Deploy. However, child disks must also use the same encryption key as their parent disks. This limitation flies in the face of the upcoming Bring Your Own Key (BYOK) provider feature. To accommodate this feature, online disk promotion will be an option once the VM is deployed. This means VMs will be deployed linked clones, privy to the deploy speed a linked clone affords. However, once the VM is created, even if it is powered on, its disks will be promoted so they no longer point back to their parents. While the VM will no longer be save the storage space a linked clone offers, the VM will also be able to support encryption. * Support VM images that are VM templates (VMTX) The architecture behind Fast Deploy makes it trivial to support deploying VM images that point to VM templates. While not in scope at this time, it is likely this becomes part of the feature prior to it graduating to production-ready. * Support for backup/restore The qualified backup/restore workflows for VM Service VMs have never been validated with linked clones as they have not been supported by VM Service up until this point. Due to how the linked clones are created in this feature, users should not expect existing backup/restore software to work with VMs provisioned with Fast Deploy at this time. To accommodate this feature, online disk promotion will be an option once the VM is deployed. This means VMs will be deployed linked clones, privy to the deploy speed a linked clone affords. However, once the VM is created, even if it is powered on, its disks will be promoted so they no longer point back to their parents. While the VM will no longer be save the storage space a linked clone offers, the VM will also be able to support backup/restore. * Support for site replication Similar to backup/restore, site replication workflows may not work with linked clones from bare disks either. To accommodate this feature, online disk promotion will be an option once the VM is deployed. This means VMs will be deployed linked clones, privy to the deploy speed a linked clone affords. However, once the VM is created, even if it is powered on, its disks will be promoted so they no longer point back to their parents. While the VM will no longer be save the storage space a linked clone offers, the VM will also be able to support site replication. * Support for datastore maintenance/migration Existing datastore maintenance/migration workflows may not be aware of or know how to handle the top-level `.contentlib-cache` directories created to cache disks from Content Library items on recommended datastores. To accommodate this feature, the goal is to transition the cached disks to be First Class Disks (FCD), but that requires some features not yet available to FCDs, such as the ability to query for the existence of an FCD based on its metadata. -~= Architecture =~- The architecture is broken down into the following sections: * Activation -- How to enable this experimental feature * Placement -- Request datastore recommendations * Disk cache -- Per-datastore cache for Content Library item disk(s) * Create VM -- Create linked clone directly from cached disk --~~== Activation ==~~-- Enabling the experimental Fast Deploy feature requires setting the environment variable `FSS_WCP_VMSERVICE_FAST_DEPLOY` to `true` in the VM Operator deployment. Please note, even when the feature is activated, it is possible to bypass the feature altogether by specifying the following annotation on a VM: `vmoperator.vmware.com/fast-deploy: "false"`. This annotation is completely ignored unless the feature is already activated via environment variable as described above. --~~== Placement ==~~-- The following steps provide a broad overview of how placement works: 1. The ConfigSpec used to create/place the VM now includes: a. The disks and controllers used by the disks from the image. The disks also specify the VM spec's storage class's underlying storage policy ID. b. The image's guest ID if none was specified by the VM class or VM spec. c. The root `VMProfile` now specifies the VM spec's storage class's underlying storage policy ID 2. A placement recommendation for datastores is always required, which uses the storage policies specified in the ConfigSpec to recommend a compatible datastore. 3. A path is constructed that points to where the VM will be created on the recommended datastore, ex.: `[<DATASTORE>] <KUBE_VM_OBJ_UUID>/<KUBE_VM_NAME>.vmx` --~~== Disk cache ==~~-- The disk(s) from a Content Library item are cached on-demand on the recommended datastore: 1. The path(s) to the image's VMDK file(s) from the underlying Content Library Item are retrieved. 2. A special, top-level directory named `.contentlib-cache` is created, if it does not exist, at the root of the recommended datastore. Please note, this does support vSAN and thus the top-level directory may actually be a UUID that is resolved to `.contentlib-cache`. 3. A path is constructed that points to where the disk(s) for the library item are expected to be cached on the recommended datastore, ex.: `[<DATASTORE>] .contentlib-cache/<LIB_ITEM_ID>/<LIB_ITEM_CONTENT_VERSION>` If this path does not exist, it is created. 4. The following occurs for each of the library item's VMDK files: a. The first 17 characters of a SHA-1 sum of the VMDK file name are used to build the expected path to the VMDK file's cached location on the recommended datastore, ex.: `[<DATASTORE>] .contentlib-cache/<LIB_ITEM_ID>/<LIB_ITEM_CONTENT_VERSION>/<17_CHAR_SHA1_SUM>.vmdk` b. If there is no VMDK at the above path, the VMDK file is copied to the above path. The cached disks and entire cache folder structure are automatically removed once there are no longer any VMs deployed as linked clones using a cached disk. This will likely change in the future to prevent the need to re-cache a disk just because the VMs deployed from it are no longer using it. Otherwise disks may need to be continuously cached, which reduces the value this feature provides. --~~== Create VM ==~~-- 1. The `VirtualDisk` devices in the ConfigSpec used to create the VM are updated with `VirtualDiskFlatVer2BackingInfo` backings that specify a parent backing. This parent backing points to the appropriate, cached, base disk from above. 2. The `CreateVM_Task` VMODL1 API is used to create the VM. Because the the VM's disks have parent backings, this new VM is effectively a linked clone.
akutz
force-pushed
the
feature/fast-deploy-internal-poc
branch
from
December 19, 2024 18:49
db5ac02
to
4f04b1a
Compare
Minimum allowed line rate is |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do, and why is it needed?
This patch adds support for the Fast Deploy feature, i.e. the ability to quickly provision a VM as a linked clone, as an experimental feature that must be enabled manually. There are many things about this feature that may change prior to it being ready for production.
The patch notes below are broken down into several sections:
Goals
The following goals are what is considered in-scope for this experimental feature at this time. Just because something is not listed, it does not mean it will not be added before the feature is made generally available:
Non-goals
The following is a list of non-goals that are not in scope at this time, although most of them should be revisited prior to this feature graduating to production:
Support VM encryption
Child disks can only be encrypted if their parent disks are encrypted. Users could deploy an encrypted VM without using Fast Deploy, and then publish that VM as an image to then be used as the source for provisioning encrypted VMs using Fast Deploy.
However, child disks must also use the same encryption key as their parent disks. This limitation flies in the face of the upcoming Bring Your Own Key (BYOK) provider feature.
To accommodate this feature, online disk promotion will be an option once the VM is deployed. This means VMs will be deployed linked clones, privy to the deploy speed a linked clone affords. However, once the VM is created, even if it is powered on, its disks will be promoted so they no longer point back to their parents. While the VM will no longer be save the storage space a linked clone offers, the VM will also be able to support encryption.
Support VM images that are VM templates (VMTX)
The architecture behind Fast Deploy makes it trivial to support deploying VM images that point to VM templates. While not in scope at this time, it is likely this becomes part of the feature prior to it graduating to production-ready.
Support for backup/restore
The qualified backup/restore workflows for VM Service VMs have never been validated with linked clones as they have not been supported by VM Service up until this point.
Due to how the linked clones are created in this feature, users should not expect existing backup/restore software to work with VMs provisioned with Fast Deploy at this time.
To accommodate this feature, online disk promotion will be an option once the VM is deployed. This means VMs will be deployed linked clones, privy to the deploy speed a linked clone affords. However, once the VM is created, even if it is powered on, its disks will be promoted so they no longer point back to their parents. While the VM will no longer be save the storage space a linked clone offers, the VM will also be able to support backup/restore.
Support for site replication
Similar to backup/restore, site replication workflows may not work with linked clones from bare disks either.
To accommodate this feature, online disk promotion will be an option once the VM is deployed. This means VMs will be deployed linked clones, privy to the deploy speed a linked clone affords. However, once the VM is created, even if it is powered on, its disks will be promoted so they no longer point back to their parents. While the VM will no longer be save the storage space a linked clone offers, the VM will also be able to support site replication.
Support for datastore maintenance/migration
Existing datastore maintenance/migration workflows may not be aware of or know how to handle the top-level
.contentlib-cache
directories created to cache disks from Content Library items on recommended datastores.To accommodate this feature, the goal is to transition the cached disks to be First Class Disks (FCD), but that requires some features not yet available to FCDs, such as the ability to query for the existence of an FCD based on its metadata.
Architecture
The architecture is broken down into the following sections:
Activation
Enabling the experimental Fast Deploy feature requires setting the environment variable
FSS_WCP_VMSERVICE_FAST_DEPLOY
totrue
in the VM Operator deployment.Please note, even when the feature is activated, it is possible to bypass the feature altogether by specifying the following annotation on a VM:
vmoperator.vmware.com/fast-deploy: "false"
. This annotation is completely ignored unless the feature is already activated via environment variable as described above.Placement
The following steps provide a broad overview of how placement works:
The ConfigSpec used to create/place the VM now includes:
The disks and controllers used by the disks from the image.
The disks also specify the VM spec's storage class's underlying storage policy ID.
The image's guest ID if none was specified by the VM class or VM spec.
The root
VMProfile
now specifies the VM spec's storage class's underlying storage policy IDA placement recommendation for datastores is always required, which uses the storage policies specified in the ConfigSpec to recommend a compatible datastore.
A path is constructed that points to where the VM will be created on the recommended datastore, ex.:
[<DATASTORE>] <KUBE_VM_OBJ_UUID>/<KUBE_VM_NAME>.vmx
Disk cache
The disk(s) from a Content Library item are cached on-demand on the
recommended datastore:
The path(s) to the image's VMDK file(s) from the underlying Content Library Item are retrieved.
A special, top-level directory named
.contentlib-cache
is created, if it does not exist, at the root of the recommended datastore.Please note, this does support vSAN and thus the top-level directory may actually be a UUID that is resolved to
.contentlib-cache
.A path is constructed that points to where the disk(s) for the library item are expected to be cached on the recommended datastore, ex.:
[<DATASTORE>] .contentlib-cache/<LIB_ITEM_ID>/<LIB_ITEM_CONTENT_VERSION>
If this path does not exist, it is created.
The following occurs for each of the library item's VMDK files:
The first 17 characters of a SHA-1 sum of the VMDK file name are used to build the expected path to the VMDK file's cached location on the recommended datastore, ex.:
[<DATASTORE>] .contentlib-cache/<LIB_ITEM_ID>/<LIB_ITEM_CONTENT_VERSION>/<17_CHAR_SHA1_SUM>.vmdk
If there is no VMDK at the above path, the VMDK file is copied to the above path.
The cached disks and entire cache folder structure are automatically removed once there are no longer any VMs deployed as linked clones using a cached disk.
This will likely change in the future to prevent the need to re-cache a disk just because the VMs deployed from it are no longer using it. Otherwise disks may need to be continuously cached, which reduces the value this feature provides.
Create VM
The
VirtualDisk
devices in the ConfigSpec used to create the VM are updated withVirtualDiskFlatVer2BackingInfo
backings that specify a parent backing which refers to the cached, base disk from above.The path to each of the VM's disks is constructed based on the index of the disk, ex.:
[<DATASTORE>] <KUBE_VM_OBJ_UUID>/<KUBE_VM_NAME>-<DISK_INDEX>.vmdk
.The
CreateVM_Task
VMODL1 API is used to create the VM. Because the the VM's disks have parent backings, this new VM is effectively a linked clone.Which issue(s) is/are addressed by this PR? (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes
NA
Are there any special notes for your reviewer:
Please add a release note if necessary: