Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance KubeVirtNodeDriver Compute Driver #1983

Merged
merged 18 commits into from
Jun 20, 2024

Conversation

cdfmlr
Copy link
Contributor

@cdfmlr cdfmlr commented Jan 4, 2024

Enhance KubeVirtNodeDriver Compute Driver

Description

This pull request brings several improvements to the create_node method and related functions within the KubeVirtNodeDriver (libcloud/compute/drivers/kubevirt.py).

Features added to KubeVirtNodeDriver.create_node:

  • Improved compatibility with the base NodeDriver class:
    • Added support for the size: NodeSize parameter, while retaining legacy compatibility with ex_cpu and ex_memory.
    • Added support for the auth: NodeAuthSSHKey|NodeAuthPassword parameter.
  • Support for deployment (deploy_node):
    • KubeVirtNodeDriver now supports node deployment automatically, benefiting from the above compatibility changes.
  • Support for general volume (disks) types that KubeVirt supports:
    • (==Breaking change==) Modified the content of the ex_disks parameter to align with the related KubeVirt API, making it compatible with any volume types rather than hardcoded support for limited volume or disk types (previously only persistentVolumeClaim was supported).
  • Added support for the ex_template parameter, allowing users to customize the entire Kubernetes object declaring the virtual machine. This is particularly useful for:
    • Supporting advanced configurations not covered by other parameters.
    • Facilitating the reuse of existing virtual machine configurations.

Fixes:

  • _to_node: Improved the logic for parsing memory, eliminating crashes on virtual machines with more than 1 GiB RAM.
  • create_node: (==Breaking change==) Renamed the parameter from ports to ex_ports.
  • Addressed various bugs in the create_node method, which were eliminated during the refactor and implementation of new features.

Other Changes:

  • libcloud/compute/drivers/kubevirt.py:
    • Added a KubeVirtNodeSize function to assist in constructing NodeSize instances for KubeVirtNodeDriver.
    • Introduced a KubeVirtNodeImage function to help construct NodeImage instances for KubeVirtNodeDriver.
    • Code reorganization: Moved DISK_TYPES out of the create_node and exported it, enabling users to access a list of supported disk types.
    • Updated docstrings for new features, adhering to sphinx grammar standards.
  • libcloud/test/compute/test_kubevirt.py:
    • Introduced tests for the create_node method: This method was not tested previously.

Status

  • done, ready for review

Checklist (tick everything that applies)

  • Code linting (required, can be done after the PR checks)
  • Documentation
  • Tests
  • ICLA (required for bigger changes)

@cdfmlr cdfmlr marked this pull request as ready for review January 4, 2024 10:41
@Kami
Copy link
Member

Kami commented Apr 18, 2024

@cdfmlr Thanks for the contribution and good PR description.

I will have a look as soon as I get a chance. In the mean time, would you mind documenting breaking changes (disks, ports) in docs/upgrades_notes.rst?

@cdfmlr
Copy link
Contributor Author

cdfmlr commented Apr 19, 2024

would you mind documenting breaking changes (disks, ports) in docs/upgrades_notes.rst?

Sure, I will add it recently.

@Kami Kami added this to the v3.9.0 milestone Apr 27, 2024
@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2024

Codecov Report

Attention: Patch coverage is 36.79245% with 134 lines in your changes are missing coverage. Please review.

Project coverage is 83.25%. Comparing base (6f1f83d) to head (0b07480).

Additional details and impacted files
@@            Coverage Diff             @@
##            trunk    #1983      +/-   ##
==========================================
- Coverage   83.26%   83.25%   -0.00%     
==========================================
  Files         353      353              
  Lines       81305    81445     +140     
  Branches     8565     8606      +41     
==========================================
+ Hits        67692    67807     +115     
+ Misses      10823    10814       -9     
- Partials     2790     2824      +34     
Files Coverage Δ
libcloud/test/compute/test_kubevirt.py 71.43% <80.00%> (+4.76%) ⬆️
libcloud/compute/drivers/kubevirt.py 32.81% <32.29%> (+9.85%) ⬆️

# check if new node is present
# But why not just use the resp from the POST request?
# Or self.get_node()?
# I don't think a for loop over list_nodes is necessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, either using list_nodes() or get_node() which calls list_nodes() underneath is not really great and efficient...

@@ -391,63 +772,162 @@ def create_node(
)
raise KeyError(msg)

claim_name = disk["volume_spec"]["claim_name"]

if claim_name not in self.ex_list_persistent_volume_claims(namespace=namespace):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method would be a bit more readable if it was refactored into multiple smaller function (e.g. one for creating volume, etc.).

"size" not in disk["volume_spec"]
or "storage_class_name" not in disk["volume_spec"]
):
msg = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test cases which cover this scenario + all other regular + edge cases?

elif isinstance(auth, NodeAuthPassword):
password = auth.password
cloud_init_config = (
"""#cloud-config\n"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bit more readable if we stored cloud init configs in template files and then load those + render them here.

@@ -1231,3 +1719,151 @@ def ex_delete_service(self, namespace, service_name):
except Exception:
raise
return result.status in VALID_RESPONSE_CODES


def _deep_merge_dict(source: dict, destination: dict) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some test cases for this function.

return destination


def _memory_in_MB(memory): # type: (Union[str, int]) -> int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, some tests would this function would be good.

@Kami
Copy link
Member

Kami commented Apr 27, 2024

@cdfmlr Thanks.

I had a look again. It mostly looks good, but there are a couple of improvements which can be made:

@cdfmlr
Copy link
Contributor Author

cdfmlr commented Apr 27, 2024

@Kami Thank you for the review. Indeed, the create_node() looks clumsy. I've actually started refactoring it, but recent higher priority tasks have delayed its completion.

Also, at the moment, I'm unable to add more test cases due to these priority. However, there are some existing test cases for _deep_merge_dict and _memory_in_MB that I believe were copied from our other projects. I plan to incorporate these tests soon.

Despite the problems with code readability and test coverage, we've been using this code in a production environment for several months. It has been performing well and handling a considerable number of situations that aren't covered by the test cases.

@Kami
Copy link
Member

Kami commented Jun 17, 2024

@cdfmlr Thanks.

Do you happen to have any ETA when you will be to add some more tests + refactor the code a bit? Since I'm planning to do a v3.9.0 release this week.

@cdfmlr
Copy link
Contributor Author

cdfmlr commented Jun 18, 2024

@Kami sorry for my delay.

I have added more tests covering the helper functions (_deep_merge_dict and _memory_in_MB) and some typical unhappy code paths. The _create_node() method has been refactored into smaller functions as well. Meanwhile, a useful new feature to set cpu/mem requests and limits separately is introduced and tested.

"""
# size -> cpu and memory limits / requests

ex_memory_limit = ex_memory_request = ex_cpu_limit = ex_cpu_request = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be safer to do:

ex_memory_limit, ex_memory_request, ex_cpu_limit, ex_cpu_request = None, None, None, None

Right now the code above works fine since we are defaulting to None, but if this ever changed to default to dictionary or a list this could have unintended side affects.

@staticmethod
def _create_node_size(
vm, size=None, ex_cpu=None, ex_memory=None
): # type: (dict, NodeSize, int, int) -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future when you are refactoring code, you can move type annotations from comment directly to the function signature (we don't support Python 2 anymore).

public_key = auth.pubkey
cloud_init_config = (
"""#cloud-config\n""" """ssh_authorized_keys:\n""" """ - {}\n"""
).format(public_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be any surprises if auth.pubkey contains a leading or a trailing line break? Aka do we need to call .strip() on the value (ideally that would already happen in the base NodeAuth class, but I need to verify that is indeed the case).

password = auth.password
cloud_init_config = (
"""#cloud-config\n"""
"""password: {}\n"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - do we need to perform any additional cleaning and sanitization of the password value?

To be on the safe side and to prevent possible YAML injections, we should escape / quote those values (password, pubkey).

Since everything except the header looks like yaml, probably the safest way is to define a dictionary for other fields and then calling yaml.dump() on it and appending it to the static header value.

Copy link
Member

@Kami Kami Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I forgot we don't have a dependency on pyyaml yet so we would need to add one which is not that great.

One option which would probably work is to just use json.dumps() on the actual pubkey / password value to take care of the value escaping / quoting, but we would need to verify it works correctly.

In [4]: s = """#cloud-config\n""" """ssh_authorized_keys:\n""" """  - {}\n"""

In [5]: print(yaml.safe_load(s.format(json.dumps("key with \" quotes ' bar"))))
{'ssh_authorized_keys': ['key with " quotes \' bar']}

It looks like it should indeed do the trick, but more unit tests + testing it with the actual cloud init would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. json.dumps seems a great workaround here. introduced in 7d7c102. also added test cases to it.

@Kami
Copy link
Member

Kami commented Jun 18, 2024

@cdfmlr Thanks for adding those changes.

I added a couple of more comments. It's mostly a couple of small things, plus the potential security issue with possible YAML injection in case the pub key or password is supplied by the end user and not sanitized before being passed to the Libcloud code.

@Kami Kami merged commit 52ad4a2 into apache:trunk Jun 20, 2024
16 of 17 checks passed
@Kami
Copy link
Member

Kami commented Jun 20, 2024

Merged into trunk. Thanks.

asfgit pushed a commit that referenced this pull request Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants