Enhance `KubeVirtNodeDriver` Compute Driver #1983

cdfmlr · 2024-01-04T08:01:02Z

Enhance `KubeVirtNodeDriver` Compute Driver

Description

This pull request brings several improvements to the create_node method and related functions within the KubeVirtNodeDriver (libcloud/compute/drivers/kubevirt.py).

Features added to KubeVirtNodeDriver.create_node:

Improved compatibility with the base NodeDriver class:
- Added support for the size: NodeSize parameter, while retaining legacy compatibility with ex_cpu and ex_memory.
- Added support for the auth: NodeAuthSSHKey|NodeAuthPassword parameter.
Support for deployment (deploy_node):
- KubeVirtNodeDriver now supports node deployment automatically, benefiting from the above compatibility changes.
Support for general volume (disks) types that KubeVirt supports:
- (==Breaking change==) Modified the content of the ex_disks parameter to align with the related KubeVirt API, making it compatible with any volume types rather than hardcoded support for limited volume or disk types (previously only persistentVolumeClaim was supported).
Added support for the ex_template parameter, allowing users to customize the entire Kubernetes object declaring the virtual machine. This is particularly useful for:
- Supporting advanced configurations not covered by other parameters.
- Facilitating the reuse of existing virtual machine configurations.

Fixes:

_to_node: Improved the logic for parsing memory, eliminating crashes on virtual machines with more than 1 GiB RAM.
create_node: (==Breaking change==) Renamed the parameter from ports to ex_ports.
~~Addressed various bugs in the create_node method, which were eliminated during the refactor and implementation of new features.~~

Other Changes:

libcloud/compute/drivers/kubevirt.py:
- Added a KubeVirtNodeSize function to assist in constructing NodeSize instances for KubeVirtNodeDriver.
- Introduced a KubeVirtNodeImage function to help construct NodeImage instances for KubeVirtNodeDriver.
- Code reorganization: Moved DISK_TYPES out of the create_node and exported it, enabling users to access a list of supported disk types.
- Updated docstrings for new features, adhering to sphinx grammar standards.
libcloud/test/compute/test_kubevirt.py:
- Introduced tests for the create_node method: This method was not tested previously.

Status

done, ready for review

Checklist (tick everything that applies)

Code linting (required, can be done after the PR checks)
Documentation
Tests
ICLA (required for bigger changes)

…spec']

No more hangs when using ex_disk. The boot disk should be the first one in disks (and volumes) list (/dev/vda), otherwise the vm will not boot.

…twork

Kami · 2024-04-18T15:59:02Z

@cdfmlr Thanks for the contribution and good PR description.

I will have a look as soon as I get a chance. In the mean time, would you mind documenting breaking changes (disks, ports) in docs/upgrades_notes.rst?

cdfmlr · 2024-04-19T02:27:07Z

would you mind documenting breaking changes (disks, ports) in docs/upgrades_notes.rst?

Sure, I will add it recently.

codecov-commenter · 2024-04-27T13:43:55Z

Codecov Report

Attention: Patch coverage is 36.79245% with 134 lines in your changes are missing coverage. Please review.

Project coverage is 83.25%. Comparing base (6f1f83d) to head (0b07480).

Additional details and impacted files

@@            Coverage Diff             @@
##            trunk    #1983      +/-   ##
==========================================
- Coverage   83.26%   83.25%   -0.00%     
==========================================
  Files         353      353              
  Lines       81305    81445     +140     
  Branches     8565     8606      +41     
==========================================
+ Hits        67692    67807     +115     
+ Misses      10823    10814       -9     
- Partials     2790     2824      +34

Files	Coverage Δ
libcloud/test/compute/test_kubevirt.py	`71.43% <80.00%> (+4.76%)`	⬆️
libcloud/compute/drivers/kubevirt.py	`32.81% <32.29%> (+9.85%)`	⬆️

Kami · 2024-04-27T13:47:18Z

libcloud/compute/drivers/kubevirt.py

+        # check if new node is present
+        # But why not just use the resp from the POST request?
+        # Or self.get_node()?
+        # I don't think a for loop over list_nodes is necessary.


Yeah, I agree, either using list_nodes() or get_node() which calls list_nodes() underneath is not really great and efficient...

Kami · 2024-04-27T13:49:45Z

libcloud/compute/drivers/kubevirt.py

@@ -391,63 +772,162 @@ def create_node(
                    )
                    raise KeyError(msg)

+                claim_name = disk["volume_spec"]["claim_name"]
+
+                if claim_name not in self.ex_list_persistent_volume_claims(namespace=namespace):


The method would be a bit more readable if it was refactored into multiple smaller function (e.g. one for creating volume, etc.).

Kami · 2024-04-27T13:50:12Z

libcloud/compute/drivers/kubevirt.py

+                        "size" not in disk["volume_spec"]
+                        or "storage_class_name" not in disk["volume_spec"]
+                    ):
+                        msg = (


Do we have test cases which cover this scenario + all other regular + edge cases?

Kami · 2024-04-27T13:51:21Z

libcloud/compute/drivers/kubevirt.py

+            elif isinstance(auth, NodeAuthPassword):
+                password = auth.password
+                cloud_init_config = (
+                    """#cloud-config\n"""


It would be a bit more readable if we stored cloud init configs in template files and then load those + render them here.

Kami · 2024-04-27T13:52:14Z

libcloud/compute/drivers/kubevirt.py

@@ -1231,3 +1719,151 @@ def ex_delete_service(self, namespace, service_name):
        except Exception:
            raise
        return result.status in VALID_RESPONSE_CODES
+
+
+def _deep_merge_dict(source: dict, destination: dict) -> dict:


It would be good to add some test cases for this function.

Kami · 2024-04-27T13:52:35Z

libcloud/compute/drivers/kubevirt.py

+    return destination
+
+
+def _memory_in_MB(memory):  # type: (Union[str, int]) -> int


Same here, some tests would this function would be good.

Kami · 2024-04-27T13:57:42Z

@cdfmlr Thanks.

I had a look again. It mostly looks good, but there are a couple of improvements which can be made:

_create_node() functionality could be refactored into multiple smaller methods. Right now the method is very large and complex.
Test case situation should be improved. There is a lot of code and conditional branches which are not not exercised / covered (https://app.codecov.io/gh/apache/libcloud/pull/1983?src=pr&el=tree&filepath=libcloud%2Fcompute%2Fdrivers%2Fkubevirt.py#diff-bGliY2xvdWQvY29tcHV0ZS9kcml2ZXJzL2t1YmV2aXJ0LnB5, Enhance KubeVirtNodeDriver Compute Driver #1983 (comment)). Existing test cases exercise only a small amount of "happy" code paths. I know you didn't add all of that code, but since you updated / touched a lot of it, it would be great to also improve the test coverage.

cdfmlr · 2024-04-27T14:41:47Z

@Kami Thank you for the review. Indeed, the create_node() looks clumsy. I've actually started refactoring it, but recent higher priority tasks have delayed its completion.

Also, at the moment, I'm unable to add more test cases due to these priority. However, there are some existing test cases for _deep_merge_dict and _memory_in_MB that I believe were copied from our other projects. I plan to incorporate these tests soon.

Despite the problems with code readability and test coverage, we've been using this code in a production environment for several months. It has been performing well and handling a considerable number of situations that aren't covered by the test cases.

Kami · 2024-06-17T14:33:52Z

@cdfmlr Thanks.

Do you happen to have any ETA when you will be to add some more tests + refactor the code a bit? Since I'm planning to do a v3.9.0 release this week.

…ler methods

cherry-pick 3a4fe39e8e this feature is required by our internal e2e test.

cdfmlr · 2024-06-18T11:45:56Z

@Kami sorry for my delay.

I have added more tests covering the helper functions (_deep_merge_dict and _memory_in_MB) and some typical unhappy code paths. The _create_node() method has been refactored into smaller functions as well. Meanwhile, a useful new feature to set cpu/mem requests and limits separately is introduced and tested.

Kami · 2024-06-18T19:34:51Z

libcloud/compute/drivers/kubevirt.py

+        """
+        # size -> cpu and memory limits / requests
+
+        ex_memory_limit = ex_memory_request = ex_cpu_limit = ex_cpu_request = None


It would be safer to do:

ex_memory_limit, ex_memory_request, ex_cpu_limit, ex_cpu_request = None, None, None, None

Right now the code above works fine since we are defaulting to None, but if this ever changed to default to dictionary or a list this could have unintended side affects.

Kami · 2024-06-18T19:35:34Z

libcloud/compute/drivers/kubevirt.py

+    @staticmethod
+    def _create_node_size(
+        vm, size=None, ex_cpu=None, ex_memory=None
+    ):  # type: (dict, NodeSize, int, int) -> None


In the future when you are refactoring code, you can move type annotations from comment directly to the function signature (we don't support Python 2 anymore).

Kami · 2024-06-18T19:37:36Z

libcloud/compute/drivers/kubevirt.py

+            public_key = auth.pubkey
+            cloud_init_config = (
+                """#cloud-config\n""" """ssh_authorized_keys:\n""" """  - {}\n"""
+            ).format(public_key)


Could there be any surprises if auth.pubkey contains a leading or a trailing line break? Aka do we need to call .strip() on the value (ideally that would already happen in the base NodeAuth class, but I need to verify that is indeed the case).

Kami · 2024-06-18T19:46:06Z

libcloud/compute/drivers/kubevirt.py

+            password = auth.password
+            cloud_init_config = (
+                """#cloud-config\n"""
+                """password: {}\n"""


Same here - do we need to perform any additional cleaning and sanitization of the password value?

To be on the safe side and to prevent possible YAML injections, we should escape / quote those values (password, pubkey).

Since everything except the header looks like yaml, probably the safest way is to define a dictionary for other fields and then calling yaml.dump() on it and appending it to the static header value.

EDIT: I forgot we don't have a dependency on pyyaml yet so we would need to add one which is not that great.

One option which would probably work is to just use json.dumps() on the actual pubkey / password value to take care of the value escaping / quoting, but we would need to verify it works correctly.

In [4]: s = """#cloud-config\n""" """ssh_authorized_keys:\n""" """ - {}\n""" In [5]: print(yaml.safe_load(s.format(json.dumps("key with \" quotes ' bar")))) {'ssh_authorized_keys': ['key with " quotes \' bar']}

It looks like it should indeed do the trick, but more unit tests + testing it with the actual cloud init would be better.

agree. json.dumps seems a great workaround here. introduced in 7d7c102. also added test cases to it.

Kami · 2024-06-18T19:58:39Z

@cdfmlr Thanks for adding those changes.

I added a couple of more comments. It's mostly a couple of small things, plus the potential security issue with possible YAML injection in case the pub key or password is supplied by the end user and not sanitized before being passed to the Libcloud code.

Kami · 2024-06-20T16:37:50Z

Merged into trunk. Thanks.

cdfmlr added 7 commits January 3, 2024 15:03

improve KubeVirtNodeDriver.create_node & bugs fix

cb4d20d

fix(KubeVirtNodeDriver): persistentVolumeClaim: disk -> disk['volume_…

1fa681f

…spec']

fix(KubeVirtNodeDriver): put containerDisk (boot disk) as the first disk

13a16a6

No more hangs when using ex_disk. The boot disk should be the first one in disks (and volumes) list (/dev/vda), otherwise the vm will not boot.

docs(compute.drivers.kubevirt): improve docstrs

51312a3

test(KubeVirtNodeDriver): add test_create_node

e0d832b

apply tox run changes: style & doc

a92dde7

support legacy params ex_cpu, ex_memory and the 3-tuple type of ex_ne…

3aaa67d

…twork

cdfmlr marked this pull request as ready for review January 4, 2024 10:41

Kami added api: compute drivers: kubevirt labels Apr 14, 2024

Merge branch 'trunk' into improve-kubevirt-node-driver

37be3ba

cdfmlr and others added 3 commits April 19, 2024 10:33

Merge branch 'apache:trunk' into improve-kubevirt-node-driver

941ab1d

Documenting breaking changes to KubeVirt driver apache#1983

fa62045

Merge branch 'trunk' into improve-kubevirt-node-driver

1874bb7

Kami added this to the v3.9.0 milestone Apr 27, 2024

Merge branch 'trunk' into improve-kubevirt-node-driver

0b07480

Kami reviewed Apr 27, 2024

View reviewed changes

cdfmlr added 2 commits June 18, 2024 19:26

test(KubeVirtNodeDriver): covere helpers and unhappy paths

f765b42

refactor(KubeVirtNodeDriver): split _create_node() into multiple smal…

40ec5ef

…ler methods

cdfmlr added 2 commits June 18, 2024 19:26

feat(KubeVirtNodeDriver): set cpu/mem requests and limits separately

4c622ee

cherry-pick 3a4fe39e8e this feature is required by our internal e2e test.

test(KubeVirtNodeDriver): cpu/mem requests and limits

f78d361

Merge branch 'trunk' into improve-kubevirt-node-driver

44f64ea

Kami reviewed Jun 18, 2024

View reviewed changes

fix(KubeVirtNodeDriver): prevent possible YAML injections

7d7c102

Kami approved these changes Jun 20, 2024

View reviewed changes

Kami merged commit 52ad4a2 into apache:trunk Jun 20, 2024
16 of 17 checks passed

asfgit pushed a commit that referenced this pull request Jun 20, 2024

Add changelog entry for #1983.

d51b382

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance `KubeVirtNodeDriver` Compute Driver #1983

Enhance `KubeVirtNodeDriver` Compute Driver #1983

cdfmlr commented Jan 4, 2024

Kami commented Apr 18, 2024

cdfmlr commented Apr 19, 2024

codecov-commenter commented Apr 27, 2024 •

edited

Loading

Kami Apr 27, 2024

Kami Apr 27, 2024

Kami Apr 27, 2024

Kami Apr 27, 2024

Kami Apr 27, 2024

Kami Apr 27, 2024

Kami commented Apr 27, 2024

cdfmlr commented Apr 27, 2024

Kami commented Jun 17, 2024

cdfmlr commented Jun 18, 2024

Kami Jun 18, 2024

Kami Jun 18, 2024

Kami Jun 18, 2024

Kami Jun 18, 2024

Kami Jun 18, 2024 •

edited

Loading

cdfmlr Jun 19, 2024

Kami commented Jun 18, 2024

Kami commented Jun 20, 2024

		return destination


		def _memory_in_MB(memory): # type: (Union[str, int]) -> int

Enhance KubeVirtNodeDriver Compute Driver #1983

Enhance KubeVirtNodeDriver Compute Driver #1983

Conversation

cdfmlr commented Jan 4, 2024