Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

viostor: cause qemu virtio_error when an uncompleted request times out #907

Open
wangyan0507 opened this issue Apr 25, 2023 · 21 comments
Open
Assignees
Labels

Comments

@wangyan0507
Copy link

wangyan0507 commented Apr 25, 2023

Describe the bug
I found an virtio_error() in QEMU when handling virtio block request with windows11 guest.

Two uncompleted blk request(req1 and reqN) has the same indirect descriptor table address. The length of indirect descriptor table in req1 is 48, but the lenth in reqN is 160. So, it cause error when handling req1 in QEMU.

The stack trace:

(gdb) bt
#0  virtio_error (vdev=0xaaaaad604420, fmt=0xaaaaabb54ad0 "Desc next is %u") at ../hw/virtio/virtio.c:3568
#1  0x0000aaaaab642868 in virtqueue_split_read_next_desc (vdev=0xaaaaad604420, desc=0xffffefff7d00, desc_cache=0xffffefff7d10, max=3, next=0xffffefff7cb4) at ../hw/virtio/virtio.c:1057
#2  0x0000aaaaab643c8c in virtqueue_split_pop (vq=0xaaaaad60f740, sz=240) at ../hw/virtio/virtio.c:1591
#3  0x0000aaaaab644584 in virtqueue_pop (vq=0xaaaaad60f740, sz=240) at ../hw/virtio/virtio.c:1768
#4  0x0000aaaaab5d8d3c in virtio_blk_get_request (s=0xaaaaad604420, vq=0xaaaaad60f740) at ../hw/block/virtio-blk.c:238
#5  0x0000aaaaab5da1fc in virtio_blk_handle_vq (s=0xaaaaad604420, vq=0xaaaaad60f740) at ../hw/block/virtio-blk.c:772
#6  0x0000aaaaab5da304 in virtio_blk_handle_output (vdev=0xaaaaad604420, vq=0xaaaaad60f740) at ../hw/block/virtio-blk.c:806
#7  0x0000aaaaab645b00 in virtio_queue_notify_vq (vq=0xaaaaad60f740) at ../hw/virtio/virtio.c:2249
#8  0x0000aaaaab648fb4 in virtio_queue_host_notifier_read (n=0xaaaaad60f7b4) at ../hw/virtio/virtio.c:3529
#9  0x0000aaaaab951690 in aio_dispatch_handler (ctx=0xaaaaac82f9a0, node=0xfffdb0002ea0) at ../util/aio-posix.c:369
#10 0x0000aaaaab951794 in aio_dispatch_ready_handlers (ctx=0xaaaaac82f9a0, ready_list=0xffffefffe040) at ../util/aio-posix.c:399
#11 0x0000aaaaab9522d0 in aio_poll (ctx=0xaaaaac82f9a0, blocking=true) at ../util/aio-posix.c:722
#12 0x0000aaaaab7a158c in iothread_run (opaque=0xaaaaac495e30) at ../iothread.c:67
#13 0x0000aaaaab956ae0 in qemu_thread_start (args=0xaaaaac82ff40) at ../util/qemu-thread-posix.c:505
#14 0x0000fffff654ef60 in  () at /usr/lib64/libc.so.6
#15 0x0000fffff65b489c in  () at /usr/lib64/libc.so.6

To Reproduce

I use gdb to attach qemu process, and debug the virtqueue_pop() in virtio block request step by step. Using gdb command 'next/continue/print' in the debug process which may cause the request timeout for windows wdk.

Expected behavior

Not cause virtio_error() in QEMU.

Screenshots
If applicable, add screenshots to help explain your problem.

Host:

  • Disto: [e.g. Fedora, Ubuntu, Proxmox]
  • Kernel version: 5.10.0
  • QEMU version: v7.2.0
  • QEMU command line: ./build/qemu-system-aarch64 -m 8G -object memory-backend-memfd,id=mem1,size=8G,share=on -numa node,memdev=mem1 -machine virt,virtualization=off,its=off,gic-version=host -accel kvm -cpu host -smp 8 -bios ../QEMU_EFI.fd -device nec-usb-xhci -device usb-kbd -device usb-tablet -object iothread,id=disk-iothread -device virtio-blk-pci,drive=win11,iothread=disk-iothread,disable-legacy=on -drive if=none,id=win11,format=raw,file=/home/win11.raw,cache=unsafe -device ramfb -vnc :2 -serial stdio -monitor telnet:localhost:4444,server,nowait -qmp unix:/home/qemu.sock,server,nowait
  • libvirt version
  • libvirt XML file

VM:

  • Windows version: windows11
  • Which driver has a problem: viostor
  • Driver version or commit hash that was used to build the driver:
    virtio-win-0.1.229.iso, virtio-win-0.1.215.iso and the compiled viostor by the newest kvm-guest-drivers-windows.

Additional context

I use dbgview to capature the debug log, and found the following RESET log before calling function virtqueue_get_buf_split.

00004734    102.45079803     RESET (FFFFA60432D88740) Function 20 Cnt 0 InQueue 1

In the windows driver doc "Handling SRB_FUNCTION_RESET_DEVICE", it said "the port driver requests a device reset when an uncompleted request times out".
https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/handling-srb-function-reset-device

@vrozenfe
Copy link
Collaborator

@wangyan0507

Thank for reporting this issue. By any chance, do you know if the problem is reproducible on x64 platforms?
What is the Widows 11 origin and the build version? Will you be able to share the Windows system log or at least the disk
and file system related events ?

Thanks,
Vaim.

@wangyan0507
Copy link
Author

wangyan0507 commented Apr 25, 2023

@vrozenfe

I'm sorry the windows system log and disk/file system related events can not upload.
But I can try on x64 platform.
The guest edition about this bug is Windows 11 Enterprise, version is 21H2.

Thanks,
Yan Wang.

@wangyan0507
Copy link
Author

wangyan0507 commented Apr 26, 2023

@asadpanda

Is the operation "Complete SRB1" right when the req1 is being processed?
image

Thanks,
Yan Wang.

@vrozenfe
Copy link
Collaborator

Technically the following scenario is possible :

  • Windows/viostor processed SRB1 by sending it to vq
  • SRB1 has not been completed after 60 seconds (default SRB completion time)
  • Windows initiates BUS(LUN) reset cycle,
  • viostor complies outstanding (uncompleted) requests including SRB1,
  • windows sends SRBn and viostor reuses buffers that used to be used by SRB1
  • request initiated with SRB1 gets completed by the back-end screwing up SRBn

If it is the case try to increase IoTimeOut value to see if it makes any difference
https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/registry-entries-for-storport-miniport-drivers

@wangyan0507
Copy link
Author

wangyan0507 commented Apr 26, 2023

@vrozenfe

The origin test with Windows11 guest has not been set TimeoutValue and IoTimeOutValue.
New test result:

  1. Not set TimeoutValue and IoTimeOutValue -> virtio_error
  2. Set TimeoutValue(60s) but not set IoTimeOutValue -> virtio_error
  3. Set TimeoutValue(60s) and IoTimeOutValue(60s) -> it's hard to cause virtio_error

So, TimeoutValue is just worked for virtio-scsi, but not for virito-blk? IoTimeOutValue is for virtio-blk?

And, even though we set IoTimeOutValue, technically it is still can cause the virtio_error problem.
What should we do to solve the potential problem thoroughly? e.g. Not complete the SRB1, just return an error.

Thanks,
Yan Wang.

@vrozenfe
Copy link
Collaborator

If I'm not mistaken TimeOutValue is for SCSI Port Miniport drivers (not related to viostor and vioscsi drivers which both
are Stor Port Miniport drivers). So we need to use IoTimeOutValue for both viostor and vioscsi.

We need to understand that 60 Seconds is a huge time period, and we can expect that a request will be completed a way sooner. Even if it is a VM working with a network storage. We can also expect that wrong with vCPU threads scheduling or with the storage back end.

Regarding the solution, I think that the most appropriated way to make it work properly will be to implement the proper
virtio queue and back end reset logic, In this case the miniport driver needs not only complete all outstanding SRB but also request QEMU to reset LUN/HBA on the RESET request. We have discussed this possibility internally, but I'm not sure if anything has been done toward this direction recently.

Best,
Vadim,

@wangyan0507
Copy link
Author

@vrozenfe

Do you have a lplan to solve the potential problem thoroughly in the future?

Thanks,
Yan Wang.

@vrozenfe
Copy link
Collaborator

vrozenfe commented Nov 1, 2023

@wangyan0507

Yes, even though the Lun reset logic implemented in #683 and #684 helps to deal with lost/uncompleted requests issues,
the proper virtio-queue reset logic needs to implemented in the Windows drivers sooner or later. Honestly, I don't have any specific time line for implementing this feature at the moment, but it is definitely in our future plans.

Best regards,
Vadim.

@wangyan0507
Copy link
Author

@vrozenfe

Thanks, expect implementing this feature soon.

I have another question, the "IoTimeOutValue" for our viostor, the registry entry path is "HKLM\System\CurrentControlSet\Services\viostor\Parameters"?

Thanks,
Yan Wang.

@vrozenfe
Copy link
Collaborator

vrozenfe commented Nov 2, 2023

Yes, [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Parameters\viostor] is the right path for
"IoTimeOutValue" parameter.

Best,
Vadim.

@wangyan0507
Copy link
Author

wangyan0507 commented Nov 2, 2023

@vrozenfe

I set 200s to IoTimeOutValue in the windows guest. And add sleep 60s before pop avail IO from vring in Qemu. After 60s, it will pop all the avail IO in vring.
The sleep operation executed about 6 times, after that it still cause qemu virtio_error. The reaon is same as this issue.
So, I want to know if 200s is the accurate io timeout value for windows. In this test, it is not reach 200s.

Thanks,
Yan Wang.

@benyamin-codez
Copy link
Contributor

@vrozenfe and @wangyan0507,

I have backported the fix for vioscsi into viostor. I think the removal of unnecessary VG unlocks will fix this issue.

Also appears to be a slight performance increase. I was able to get 4.4GB/s on boot+system disk and 6.1-6.3GB/s on additional disks.

Perhaps you could test for us Yan Wang? You will need to build against master and enable TESTSIGNING.

--- ./a/viostor/virtio_stor_hw_helper.c 2024-08-06 05:42:24.000000000 +1000
+++ ./b/viostor/virtio_stor_hw_helper.c 2024-08-15 21:33:31.283522952 +1000
@@ -160,6 +160,7 @@
 
     ULONG               QueueNumber = 0;
     ULONG               MessageId = 1;
+    int                 res = 0;
     ULONG               OldIrql = 0;
     BOOLEAN             result = FALSE;
     bool                notify = FALSE;
@@ -194,30 +195,29 @@
     vq = adaptExt->vq[QueueNumber];
     RhelDbgPrint(TRACE_LEVEL_VERBOSE, " QueueNumber 0x%x vq = %p\n", QueueNumber, vq);
 
-    element = &adaptExt->processing_srbs[QueueNumber];
     VioStorVQLock(DeviceExtension, MessageId, &LockHandle, FALSE);
-    if (virtqueue_add_buf(vq,
-                     &srbExt->sg[0],
-                     srbExt->out, srbExt->in,
-                     &srbExt->vbr, va, pa) >= 0) {
+    res = virtqueue_add_buf(vq,
+        &srbExt->sg[0],
+        srbExt->out, srbExt->in,
+        &srbExt->vbr, va, pa);
+
+    if (res >= 0) {
         notify = virtqueue_kick_prepare(vq);
+        element = &adaptExt->processing_srbs[QueueNumber];
         InsertTailList(&element->srb_list, &srbExt->vbr.list_entry);
         element->srb_cnt++;
-        VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
 #ifdef DBG
         InterlockedIncrement((LONG volatile*)&adaptExt->inqueue_cnt);
 #endif
         result = TRUE;
-    }
-    else {
-        VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
+    } else {
         RhelDbgPrint(TRACE_LEVEL_ERROR, " Can not add packet to queue %d.\n", QueueNumber);
         StorPortBusy(DeviceExtension, 2);
     }
-    if (notify) {
+    VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
+    if (notify){
         virtqueue_notify(vq);
     }
-
     if (adaptExt->num_queues > 1) {
         if (CHECKFLAG(adaptExt->perfFlags, STOR_PERF_OPTIMIZE_FOR_COMPLETION_DURING_STARTIO)) {
            VioStorCompleteRequest(DeviceExtension, MessageId, FALSE);

@Atchess
Copy link

Atchess commented Sep 20, 2024

@vrozenfe
If we don't have plans to implement SRB_FUNCTION_RESET_LOGICAL_UNIT in the short term, can we return SRB_STATUS_INVALID_REQUEST to indicate that we don't support the SRB, instead of completing all pending requests?
This way, it would prevent abnormal IO errors and only cause Windows to keep waiting for the IO.

@vrozenfe
Copy link
Collaborator

@Atchess

Do you want to add an extra case to ACTION_ON_RESET https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/vioscsi/vioscsi.h#L281C3-L281C17 and handle it respectively?

Thanks,
Vadim.

@Atchess
Copy link

Atchess commented Sep 25, 2024

@vrozenfe
No, I only want to remove CompletePendingRequests from viostor and will not modify the code in vioscsi.

--- a/viostor/virtio_stor.c
+++ b/viostor/virtio_stor.c
@@ -958,8 +958,8 @@ VirtIoStartIo(
         case SRB_FUNCTION_RESET_BUS:
         case SRB_FUNCTION_RESET_DEVICE:
         case SRB_FUNCTION_RESET_LOGICAL_UNIT: {
-            CompletePendingRequests(DeviceExtension);
-            CompleteRequestWithStatus(DeviceExtension, (PSRB_TYPE)Srb, SRB_STATUS_SUCCESS);
+            CompleteRequestWithStatus(DeviceExtension, (PSRB_TYPE)Srb, SRB_STATUS_INVALID_REQUEST);
 #ifdef DBG
             RhelDbgPrint(TRACE_LEVEL_INFORMATION, " RESET (%p) Function %x Cnt %d InQueue %d\n",
                         Srb, SRB_FUNCTION(Srb), adaptExt->srb_cnt, adaptExt->inqueue_cnt);

@vrozenfe
Copy link
Collaborator

@Atchess

Ah, sorry. Don't know why, but thought that you are asking about virtio-scsi driver.
Who is going to complete the pending request then? Does your HBA has some sort of logic to deal with uncompleted/abandoned requests?

Best,
Vadim.

@Atchess
Copy link

Atchess commented Sep 25, 2024

@vrozenfe

We will not complete the pending request, so windows will not recycle it. If the driver completes the pending request without stopping the device or resetting the virtio queues, windows will recycle the memory associated with that request. However, the device may still access this memory because the request remains in the virtio queues, potentially leading to file system corruption. If the driver does not complete the pending request, windows will continue to wait for its completion, resulting in an io hang, similar to an io hang in Linux.

Best,
Qi.

@vrozenfe
Copy link
Collaborator

@Atchess

Yes, the class driver will not reuse an abandoned SRB and IIRC will be able to allocate a new block if needed (balancing between minimal and maximal working set of packages).
That will probably work fine in case if we don't have a massive packages loss condition which can happen on a network storage (at least such scenario can be easily emulated).
Apart from that, if I am not mistaken a storport miniport driver is required to complete any uncompleted requests in case of receiving SRB_FUNCTION_RESET_LOGICAL_UNIT or SRB_FUNCTION_RESET_DEVICE requests.

In any case if you really need to complete SRB_FUNCTION_RESET_* requests as SRB_STATUS_INVALID_REQUEST and
you are going to submit your patch I would suggest to introduce a new Registry key to be able to change the current behaviour and fulfil your needs (just as it is done in vioscsi) but keeping the default execution flow as is.

But a long term target, the proper implementation of multitier reset provided by storport definitely will be the best solution.

Best,
Vadim.

@wangyan0507
Copy link
Author

@vrozenfe and @wangyan0507,

I have backported the fix for vioscsi into viostor. I think the removal of unnecessary VG unlocks will fix this issue.

Also appears to be a slight performance increase. I was able to get 4.4GB/s on boot+system disk and 6.1-6.3GB/s on additional disks.

Perhaps you could test for us Yan Wang? You will need to build against master and enable TESTSIGNING.

--- ./a/viostor/virtio_stor_hw_helper.c 2024-08-06 05:42:24.000000000 +1000
+++ ./b/viostor/virtio_stor_hw_helper.c 2024-08-15 21:33:31.283522952 +1000
@@ -160,6 +160,7 @@
 
     ULONG               QueueNumber = 0;
     ULONG               MessageId = 1;
+    int                 res = 0;
     ULONG               OldIrql = 0;
     BOOLEAN             result = FALSE;
     bool                notify = FALSE;
@@ -194,30 +195,29 @@
     vq = adaptExt->vq[QueueNumber];
     RhelDbgPrint(TRACE_LEVEL_VERBOSE, " QueueNumber 0x%x vq = %p\n", QueueNumber, vq);
 
-    element = &adaptExt->processing_srbs[QueueNumber];
     VioStorVQLock(DeviceExtension, MessageId, &LockHandle, FALSE);
-    if (virtqueue_add_buf(vq,
-                     &srbExt->sg[0],
-                     srbExt->out, srbExt->in,
-                     &srbExt->vbr, va, pa) >= 0) {
+    res = virtqueue_add_buf(vq,
+        &srbExt->sg[0],
+        srbExt->out, srbExt->in,
+        &srbExt->vbr, va, pa);
+
+    if (res >= 0) {
         notify = virtqueue_kick_prepare(vq);
+        element = &adaptExt->processing_srbs[QueueNumber];
         InsertTailList(&element->srb_list, &srbExt->vbr.list_entry);
         element->srb_cnt++;
-        VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
 #ifdef DBG
         InterlockedIncrement((LONG volatile*)&adaptExt->inqueue_cnt);
 #endif
         result = TRUE;
-    }
-    else {
-        VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
+    } else {
         RhelDbgPrint(TRACE_LEVEL_ERROR, " Can not add packet to queue %d.\n", QueueNumber);
         StorPortBusy(DeviceExtension, 2);
     }
-    if (notify) {
+    VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE);
+    if (notify){
         virtqueue_notify(vq);
     }
-
     if (adaptExt->num_queues > 1) {
         if (CHECKFLAG(adaptExt->perfFlags, STOR_PERF_OPTIMIZE_FOR_COMPLETION_DURING_STARTIO)) {
            VioStorCompleteRequest(DeviceExtension, MessageId, FALSE);

Sorry for long time no reply. I do not get the point why the patch can fix the problem and improve the performance.

@YanVugenfirer YanVugenfirer reopened this Dec 3, 2024
@benyamin-codez
Copy link
Contributor

@wangyan0507

My earlier comment you quoted was at a time when my focus was on spinlocks. The issue in vioscsi turned out to be due to doing cb_notifications whilst under a spinlock. This is not the case in viostor.

FYI, the changes in #1174 were mostly refactoring. There is a minor performance increase due to removing the virtqueue struct vq in favour of adaptExt->vq[QueueNumber]. The performance gains reported earlier were due to me not using appropriately random data in my test files.

The issue you reported could be due to spinlocks elsewhere or perhaps out-of-order DPCs, or maybe even CPU affinity... It's also possible that there might have been some reset logic implemented in the last year that has resolved this issue. I'm happy to help track it down, but do you think you could first try to reproduce the problem again using mm286/b266, i.e. virtio-win-0.1.266, and / or using drivers built from master, on top of a more recent kernel and QEMU...?

@wangyan0507
Copy link
Author

@benyamin-codez

Yeah, I can use the new viostor and qemu to reproduce the current problem. It may take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants