Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netkvm: Enhancing Host Throughput by Combining Virtio Header and Data in a Single Memory Block for NetKVM #1078

Open
zjmletang opened this issue Apr 3, 2024 · 10 comments

Comments

@zjmletang
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In the current reception logic of netkvm, due to virtio protocol headers and data packets being in two separate memory blocks. so at least two memory blocks are needed for one descriptor. From the host's perspective, this means that the network card (hardware implementation) requires two DMA operations to retrieve a single packet, thus consuming more PCIe bandwidth. When PCIe bandwidth resources are strained, this leads to a performance bottleneck as the CPU retrieves descriptors from the Windows virtio frontend driver more slowly.

image

Describe the solution you'd like
I'd like to discuss with you whether this can be done: The virtio header and data packet can be combined into the same memory block, so that in most scenarios, a descriptor need only contain one memory block.

I conducted tests on our own cloud platform according to the improvement approach and the tests show that this method is fairly effective at increasing hardware bandwidth utilization, with the overall network card throughput increasing about 10%.

image

@YanVugenfirer
Copy link
Collaborator

@zjmletang
Copy link
Contributor Author

Related: https://issues.redhat.com/browse/RHEL-20220

I have been exploring the details surrounding VIRTIO_NET_F_MRG_RXBUF and found myself a bit perplexed regarding the association between the merging of the Virtio header and the packet as mentioned in the issue, in relation to the activation of VIRTIO_NET_F_MRG_RXBUF. It appears to me that the merging of the Virtio Header with the packet is feasible even when VIRTIO_NET_F_MRG_RXBUF is not utilized. If it's not too much trouble, I would greatly appreciate your insights on how these two aspects are interconnected.

@ybendito
Copy link
Collaborator

@zjmletang
VIRTIO_NET_F_MRG_RXBUF is automatically enabled when the device indicates VIRTIO_F_VERSION_1, this is part of virtio spec (network part). Just indicating merged buffers does not make any effect. Existing driver in case when RSC is enabled allocates buffers of ~64K for each RX packet (this might be up to 17 pages), so the device does not have any motivation even to try using merged buffers. In case of no RSC - the driver uses one or more 4K buffers depending on Jumbo setting.

  1. Your table above indicates some improvement as a result of some modification. I suggest first of all that you share with us the modified code (just to understand the idea of the modification).
  2. For discussion it is important to understand:
  • which virtio features the device supports and which does not. The list of features of interest is here
  • how many queues the device supports/uses in typical performance scenarios
  • what are typical use cases you want to focus on? Large coalesced RX packets, mid-size RX packets, other?
  1. The issue Yan mentioned above does not focus on any specific modification. It is more about "Linux driver does use this feature. We (under Windows) do not use it at the moment, Is there something good we can have if we do and in which scenarios, if any?"
    So, I suggest to start from answering questions 1 and 2 and then we can make progress with this discussion.

@zjmletang
Copy link
Contributor Author

@ybendito
thanks

  1. Sure, I will submit the code as a patch in a few days, just for discussion purposes.

  • which virtio features the device supports and which does not. The list of features of interest is here

host features 0x000001035867ffe3, guest features 0x00000003186799a3

  • how many queues the device supports/uses in typical performance scenarios

16

  • what are typical use cases you want to focus on? Large coalesced RX packets, mid-size RX packets, other?

We are more concerned with how this change would enhance the overall PCIe bandwidth on the host side.
image

zjmletang added a commit to zjmletang/kvm-guest-drivers-windows that referenced this issue Apr 17, 2024
… in a single memory block

for the problem description
please visit virtio-win#1078
zjmletang added a commit to zjmletang/kvm-guest-drivers-windows that referenced this issue Apr 17, 2024
… in a single memory block

for the problem description
please visit virtio-win#1078
@zjmletang
Copy link
Contributor Author

zjmletang commented Apr 17, 2024

@ybendito
Based on your suggestion, I've uploaded the code. This section of the code still requires optimization and now is solely for discussion purposes. please see #1089

@ybendito
Copy link
Collaborator

@zjmletang

  1. as an idea, this seems suitable, this should decrement amount of DMA transactions for physical device and shouln't have a negative impact on paravirtualized one
  2. some time ago I worked on more flexible schema of memory layout. no PR yet the preview here
  3. Applying your suggestion on upstream at the moment seems to me kind of problematic: the change is very sensitive, the existing driver supports not only V1 devices but also legacy devices that must have the header in separate descriptor (see virtio spec 1.3, 5.1.6.6) and so the driver must be tested well in several non-default device configurations. I'm not sure we have enough resources for such coverage at the moment
  4. Not related to this, but to features - I seems strange that the feature ctrl_guest_offload(2) is not in. It is not complicated for implementation but without it some of flows might not work correctly.

@zjmletang
Copy link
Contributor Author

@zjmletang
3. Applying your suggestion on upstream at the moment seems to me kind of problematic: the change is very sensitive, the existing driver supports not only V1 devices but also legacy devices that must have the header in separate descriptor (see virtio spec 1.3, 5.1.6.6) and so the driver must be tested well in several non-default device configurations. I'm not sure we have enough resources for such coverage at the moment

@ybendito ,
Indeed,it is very sensitive. Is it possible for us to make distinctions based on whether the VIRTIO_F_ANY_LAYOUT feature bit has been negotiated or not? If it has been successfully negotiated, we could consider merging; if not, we could maintain the separation into two segments.

@JonKohler
Copy link
Contributor

cc @sb-ntnx fyi - this would be neat for us as well.

@zjmletang
Copy link
Contributor Author

hi @ybendito
I am writing to follow up on this issue, I proposed a solution to make distinctions based on whether the VIRTIO_F_ANY_LAYOUT feature bit has been negotiated(If this solution is incorrect, please let me know as well). This would help mitigate potential risks while addressing the needs for both V1 and legacy devices (virtio spec 1.3, section 5.1.6.6).

Could you please let me know if there are any plans for further modifications or additional concerns that need to be addressed? Your feedback will greatly help us decide our next steps, whether to await further changes or start making some adjustments based on the current state of the community's code.

Thank you very much for your time and assistance.

Best regards,

@zjmletang
3. Applying your suggestion on upstream at the moment seems to me kind of problematic: the change is very sensitive, the existing driver supports not only V1 devices but also legacy devices that must have the header in separate descriptor (see virtio spec 1.3, 5.1.6.6) and so the driver must be tested well in several non-default device configurations. I'm not sure we have enough resources for such coverage at the moment

@ybendito , Indeed,it is very sensitive. Is it possible for us to make distinctions based on whether the VIRTIO_F_ANY_LAYOUT feature bit has been negotiated or not? If it has been successfully negotiated, we could consider merging; if not, we could maintain the separation into two segments.

@heywji
Copy link

heywji commented Dec 19, 2024

Open the downstream Jira issue for Investment Evaluation.

RHEL-71813 netkvm: evaluate merging header and data to optimize host throughput

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants