Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-SDV [vioscsi] Reduce spinlock management complexity #1175

Closed

Conversation

benyamin-codez
Copy link
Contributor

Spinlocks are now primarily of the DpcLock type, without which the driver will not properly operate. Fallback to an InterruptLock type spinlock is limited, perhaps even cosmetic.

This commit removes the VioScsiVQLock() and VioScsiVQUnlock() wrapper routines to use the native functions instead, i.e. StorPortAcquireSpinLock() and StorPortReleaseSpinLock(). VioScsiVQLock() contained a superfluous MSI-X and QueueNumber check and VioScsiVQUnlock() did not release non-DpcLock type spinlocks. Removing the wrapper routines also resolves these two issues.

Refactoring includes:

  • Renaming of ProcessQueue() to ProcessBuffer(). Whilst technically more accurate, this also provides space for a new ProcessQueue() routine, via which the HW_MESSAGE_SIGNALLED_INTERRUPT_ROUTINE and HW_INTERRUPT routines can then be consolidated.
  • Where necessary, changing BOOLEAN isr to STOR_SPINLOCK LockMode
  • CamelCase / conformity correction of MessageID to MessageId
  • Conformity correction of QueuNum to QueneNumber
  • Removal of unnecessary use of MsgID and MessageID or MessageId in area of focus (11 instances of MessageID remain in other vioscsi.c routines)
  • Mnemonic rename of:
    • CompletePendingRequests() to CompletePendingRequestsOnReset()
    • index to vq_req_idx
  • Syntactic correction of if statement (added braces)

Spinlocks are now primarily of the DpcLock type, without which the
driver will not properly operate. Fallback to an InterruptLock type
spinlock is limited, perhaps even cosmetic.

This commit removes the VioScsiVQLock() and VioScsiVQUnlock()
wrapper routines to use the native functions instead, i.e.
StorPortAcquireSpinLock() and StorPortReleaseSpinLock().
VioScsiVQLock() contained a superfluous MSI-X and QueueNumber check
and VioScsiVQUnlock() did not release non-DpcLock type spinlocks.
Removing the wrapper routines also resolves these two issues.

Refactoring includes:
* Renaming of ProcessQueue() to ProcessBuffer(). Whilst technically more
  accurate, this also provides space for a new ProcessQueue() routine,
  via which the HW_MESSAGE_SIGNALLED_INTERRUPT_ROUTINE and HW_INTERRUPT
  routines can then be consolidated.
* Where necessary, changing BOOLEAN isr to STOR_SPINLOCK LockMode
* CamelCase / conformity correction of MessageID to MessageId
* Conformity correction of QueuNum to QueneNumber
* Removal of unnecessary use of MsgID and MessageID or MessageId in area
  of focus (11 instances of MessageID remain in other vioscsi.c routines)
* Mnemonic rename of:
  * CompletePendingRequests() to CompletePendingRequestsOnReset()
  * index to vq_req_idx
* Syntactic correction of if statement (added braces)

Signed-off-by: benyamin-codez <[email protected]>
Copy link
Contributor

@JonKohler JonKohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at first blush, will defer to Vadim for a deeper review

@benyamin-codez
Copy link
Contributor Author

After exploring various blends of different spinlock types, I was minded to remove the spinlock wrappers entirely.

This approach also has a minor performance improvement.

For reference:

The full spinlock manager I used: https://github.com/benyamin-codez/kvm-guest-drivers-windows/blob/bc85387690c04baaf21adad52b2ee827c3f740f3/vioscsi/helper.c#L853-L956
An essential-only version: https://github.com/benyamin-codez/kvm-guest-drivers-windows/blob/bc85387690c04baaf21adad52b2ee827c3f740f3/vioscsi/helper.c#L806-L851

@vrozenfe
Copy link
Collaborator

@benyamin-codez

Thanks a lot.
Can you please check it briefly on the following configurations:

  • num_queues=1 (from the qemu command line)
  • and MSISupported Registry key set to "0"
    Both these configurations are quite rare nowadays and mostly mean the HW malfunction. But we still need to
    make sure that the driver is functional if the underling virtio HBA is a kind of "broken".

Best regards,
Vadim.

@benyamin-codez
Copy link
Contributor Author

@vrozenfe

Can you please check it briefly on the following configurations:

* num_queues=1 (from the qemu command line)
* and  MSISupported Registry key set to "0"

Thanks Vadim.

I am pleased to confirm this commit works both with and without MSI-X using the registry control method via the MSISupported value in the ENUM\PCI registry tree.

I have been explicitly testing num_queues=1 by using the vCPU=1 method. I am pleased to confirm this method also works whether MSI-X is enabled or disabled. I trust this is sufficient.

I also tested MSISupported=0 with vCPU>1 which results in num_queues=1 and successful operation.

Regarding my comments about calls to InterruptLock type spinlocks being largely cosmetic, I should note I'm not sure how this commit or master would work when !adaptExt->dpc_ok... My understanding is that it will fail at HW_INITIALIZE, per:

if (!adaptExt->dpc_ok && !StorPortEnablePassiveInitialization(DeviceExtension, VioScsiPassiveInitializeRoutine)) {
RhelDbgPrint(TRACE_LEVEL_FATAL, " StorPortEnablePassiveInitialization FAILED\n");
return FALSE;
As such, should we remove the calls to the remaining InterruptLock type spinlocks?

If so, I would propose doing so in a different PR, perhaps following consolidation of the HW_INTERRUPT and HW_MESSAGE_SIGNALLED_INTERRUPT_ROUTINE codepaths (which I intended on raising after this PR was merged). The reason for this is that VioScsiInterrupt() currently contains a branch to call an InterruptLock type spinlock. The consolidation then relies on the sole remaining InterruptLock type spinlock in DispatchQueue().

The natural progression of such refactoring would be to remove conditional checks for adaptExt->dpc_ok. Perhaps this could form part of an additional PR to remove the remaining InterruptLock type spinlock...?

Please let me know your thoughts on this and if you have any other concerns.

Best regards,
Ben

cc: @YanVugenfirer

@vrozenfe
Copy link
Collaborator

I am pleased to confirm this commit works both with and without MSI-X using the registry control method via the MSISupported value in the ENUM\PCI registry tree.

I have been explicitly testing num_queues=1 by using the vCPU=1 method. I am pleased to confirm this method also works whether MSI-X is enabled or disabled. I trust this is sufficient.

Thanks,
we also need to test vCPU > 1 and num_queues = 1 configuration because in this case we can expect ISR and DPC to be executed on different CPUs simultaneously.

I also tested MSISupported=0 with vCPU>1 which results in num_queues=1 and successful operation.

Regarding my comments about calls to InterruptLock type spinlocks being largely cosmetic, I should note I'm not sure how this commit or master would work when !adaptExt->dpc_ok... My understanding is that it will fail at HW_INITIALIZE, per:

if (!adaptExt->dpc_ok && !StorPortEnablePassiveInitialization(DeviceExtension, VioScsiPassiveInitializeRoutine)) {
RhelDbgPrint(TRACE_LEVEL_FATAL, " StorPortEnablePassiveInitialization FAILED\n");
return FALSE;

that !adaptExt->dpc_ok condition is a kind of tricky one. IIRC, we hit this case when resuming from Suspend and/or Hibernation

Best,
Vadim.

@benyamin-codez
Copy link
Contributor Author

@vrozenfe

Thanks, we also need to test vCPU > 1 and num_queues = 1 configuration because in this case we can expect ISR and DPC to be executed on different CPUs simultaneously.

We know that when MSISupported=0 that num_queues=1 via:

if (adaptExt->dump_mode || !adaptExt->msix_enabled)
{
adaptExt->num_queues = 1;
}
So using MSISupported=1 and num_queues=1 via QEMU command line with vCPU>1 results in VioScsiInterrupt() only issuing DPCs whilst adaptExt->dpc_ok=TRUE via:
if (!adaptExt->dump_mode && adaptExt->dpc_ok)
{
StorPortIssueDpc(DeviceExtension,
&adaptExt->dpc[0],
ULongToPtr(QUEUE_TO_MESSAGE(VIRTIO_SCSI_REQUEST_QUEUE_0)),
ULongToPtr(QUEUE_TO_MESSAGE(VIRTIO_SCSI_REQUEST_QUEUE_0)));
}
else
{
ProcessQueue(DeviceExtension, QUEUE_TO_MESSAGE(VIRTIO_SCSI_REQUEST_QUEUE_0), TRUE);
}
The InterruptLock type spinlock (which I have not tested) would only be used in the else branch, i.e. when IsCrashDumpMode=TRUE or when adaptExt->dpc_ok=FALSE (or !adaptExt->dpc_ok).

In any case - to answer your question - this configuration also tests ok with this commit.

that !adaptExt->dpc_ok condition is a kind of tricky one. IIRC, we hit this case when resuming from Suspend and/or Hibernation

Very tricky indeed from what I can tell..! 8^d

In fact, I cannot see how this would work when all other spinlocks are the DpcLock type. In my previous testing, issuing InterruptLock type spinlocks when adaptExt->dpc_ok=TRUE usually clobbered in-flight DPCs.

I think this is worth exploring, but in a future PR as previously suggested.

Best regards,
Ben

cc: @YanVugenfirer

@kostyanf14 kostyanf14 changed the title [vioscsi] Reduce spinlock management complexity NO-SDV [vioscsi] Reduce spinlock management complexity Nov 3, 2024
@kostyanf14
Copy link
Member

rerun tests

@benyamin-codez
Copy link
Contributor Author

@vrozenfe, @YanVugenfirer

Thanks, we also need to test vCPU > 1 and num_queues = 1 configuration because in this case we can expect ISR and DPC to be executed on different CPUs simultaneously.

In any case - to answer your question - this configuration also tests ok with this commit.

If I'm not mistaken, this one should be good to go too, per #1181 (comment).

Perhaps it's worthwhile running one more CI retest?

Best regards,
Ben

cc: @kostyanf14

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Perhaps it's worthwhile running one more CI retest?

What do you think? Or is this one right to go too?

@kostyanf14
Copy link
Member

rerun tests

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

Thanks for re-running the checks.

Apart from the NO-SDV / Static Tools Logo Test fails, it looks like the same 0x80070002 errors.

Are we getting BSODs on Server2022+...?!?

@kostyanf14
Copy link
Member

@benyamin-codez

NO-SDV means we skip DVL.XML generation, so Static Tools Logo Test will fails. This is expected.
This was done just to reduce the build time.

@benyamin-codez
Copy link
Contributor Author

@kostyanf14

NO-SDV means we skip DVL.XML generation, so Static Tools Logo Test will fails. This is expected.
This was done just to reduce the build time.

Yep, that was pretty clear from the Jenkins config you shared the other day.

I was more concerned with the Flush Test and Disk Verification (LOGO) on Server2022 and the latter on Server2025.
I could only find the same 0x80070002 errors in the logs, as found in #1181 (comment).

In that PR, you stated it was safe to proceed. However, this PR is much more consequential, and having those failures is a concern. Are the failures a known issue which can be disregarded? Or are you able to provide some insight as to why they are occurring, e.g. are there BSODs?

With kind regards,
Ben

@YanVugenfirer
Copy link
Collaborator

@vrozenfe is it OK to merge?

Copy link
Collaborator

@vrozenfe vrozenfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Thank you.
Vadim.

@YanVugenfirer
Copy link
Collaborator

@benyamin-codez sorry, can you rebase please?

@YanVugenfirer
Copy link
Collaborator

@benyamin-codez Sorry, we don't want to have merge commits in the repo. Can you please resolve the conflicts of rebase?

@benyamin-codez
Copy link
Contributor Author

@YanVugenfirer

Apologies. I hope that reset and forced push was ok... 8^d

I'm not seeing any rebase conflicts...

@kostyanf14
Copy link
Member

@benyamin-codez

GitHub still think that PR is not good

This branch cannot be rebased safely
Rebasing the commits of this branch on top of the base branch cannot be performed automatically as this would create a different result than a regular merge. 

@kostyanf14
Copy link
Member

@benyamin-codez I stopped CI, as PR is blocked from merging by GitHub anyway.

When the user stops the viogpusc service either through SCM or
console command line, a timeout error is thrown out as the
following,

From the console,
The service did not respond to the start or control request
in a timely fashion.

From SCM
A timeout was reached while waiting for a transaction response
from the viogpusc service.

This error is due to the incomplete initialization of the
Session creating viogpuap process. This initialization is
blocked until the process is terminated. When SCM or console
command tries to stop the service, it always fails because the
Session is waiting for the process to be terminated. Actually,
the process is waiting for the service to trigger an event to
terminate itself. This is a dead loop.

To fix this issue, the Session initialization should be completed,
and the Session Manager should monitor the termination of the
process and release the resources after the process is terminated.

Signed-off-by: Annie Li <[email protected]>
ybendito and others added 10 commits November 19, 2024 11:41
Includes:
a) New RUN_UNCHECKED and RUN_MIN_CHECKED definitions for compilation conditional tracing
b) DBG definition fix (remove superfluous "define")
c) Use keywords for WPP FLAGS
d) New INLINE function WPP definitions
e) WPP Optimisation
f) ETW Prettification
g) Suggestion to cull code for InitializeDebugPrints() when EVENT_TRACING

Signed-off-by: benyamin-codez <[email protected]>
Includes:
a) Indentation fixes upon review (Credit to the eagle eye of @JonKohler..!)
b) Removes unnecessary and orphaned code
c) Culls unused code for InitializeDebugPrints() when EVENT_TRACING
d) Added comments to show necessary and necessary ETW inclusions

Signed-off-by: benyamin-codez <[email protected]>
SDV defects in vioscsi.c
a) HwStorPortProhibitedDDIs Must-Fix defect in VioScsiInterrupt() IRQL for trace
b) NullCheck defect in CompleteRequest() OpCode for trace

Signed-off-by: benyamin-codez <[email protected]>
Backports fixes and improvements from vioscsi PRs virtio-win#1150 and virtio-win#1162

virtqueue struct vq was also removed in favour of adaptExt->vq[QueueNumber],
which results in a minor performance increase.

Signed-off-by: benyamin-codez <[email protected]>
Signed-off-by: Yuri Benditovich <[email protected]>
This fix prevents viostor from reporting requests with MessageNumber of zero as failed even in case they succeed.

Signed-Off-By: Martin Drab <[email protected]>
@benyamin-codez
Copy link
Contributor Author

@YanVugenfirer @kostyanf14

Will that work, or have I cooked it..?

It certainly seems rebased now, but I see it still says This branch cannot be rebased safely...

FYI, I used:

git checkout master 
git pull
git checkout SpinLocks-minimalist 
git rebase master
git pull --rebase
git push

Is that right, or should I have used another technique?

Let me know if you need a fresh PR instead.

@benyamin-codez
Copy link
Contributor Author

benyamin-codez commented Nov 19, 2024

@YanVugenfirer @kostyanf14

I see it still says This branch cannot be rebased safely...

This doesn't look right to me at all...
Please advise alternate technique or should I just do a fresh PR against master?

EDIT: Added quote for context.

https://issues.redhat.com/browse/RHEL-66228

Initially disable the DMA remapping. Will be enabled
according to the needs in future.

Signed-off-by: Yuri Benditovich <[email protected]>
benyamin-codez added a commit to benyamin-codez/kvm-guest-drivers-windows that referenced this pull request Nov 20, 2024
Spinlocks are now primarily of the DpcLock type, without which the
driver will not properly operate. Fallback to an InterruptLock type
spinlock is limited, perhaps even cosmetic.

This commit removes the VioScsiVQLock() and VioScsiVQUnlock()
wrapper routines to use the native functions instead, i.e.
StorPortAcquireSpinLock() and StorPortReleaseSpinLock().
VioScsiVQLock() contained a superfluous MSI-X and QueueNumber check
and VioScsiVQUnlock() did not release non-DpcLock type spinlocks.
Removing the wrapper routines also resolves these two issues.

Refactoring includes:
* Renaming of ProcessQueue() to ProcessBuffer(). Whilst technically more
  accurate, this also provides space for a new ProcessQueue() routine,
  via which the HW_MESSAGE_SIGNALLED_INTERRUPT_ROUTINE and HW_INTERRUPT
  routines can then be consolidated.
* Where necessary, changing BOOLEAN isr to STOR_SPINLOCK LockMode
* CamelCase / conformity correction of MessageID to MessageId
* Conformity correction of QueuNum to QueneNumber
* Removal of unnecessary use of MsgID and MessageID or MessageId in area
  of focus (11 instances of MessageID remain in other vioscsi.c routines)
* Mnemonic rename of:
  * CompletePendingRequests() to CompletePendingRequestsOnReset()
  * index to vq_req_idx
* Syntactic correction of if statement (added braces)

DUPLICATE of PR virtio-win#1175 to produce clean rebased version.

Signed-off-by: benyamin-codez <[email protected]>
@benyamin-codez
Copy link
Contributor Author

@vrozenfe @YanVugenfirer @kostyanf14

I'm still getting This branch cannot be rebased safely following another rebase...

Here is a fresh PR against master: #1196

@benyamin-codez
Copy link
Contributor Author

Binary equivalent PR #1196 is reporting This branch has no conflicts with the base branch when rebasing...
Therefore, I will close this PR...

YanVugenfirer pushed a commit that referenced this pull request Nov 20, 2024
Spinlocks are now primarily of the DpcLock type, without which the
driver will not properly operate. Fallback to an InterruptLock type
spinlock is limited, perhaps even cosmetic.

This commit removes the VioScsiVQLock() and VioScsiVQUnlock()
wrapper routines to use the native functions instead, i.e.
StorPortAcquireSpinLock() and StorPortReleaseSpinLock().
VioScsiVQLock() contained a superfluous MSI-X and QueueNumber check
and VioScsiVQUnlock() did not release non-DpcLock type spinlocks.
Removing the wrapper routines also resolves these two issues.

Refactoring includes:
* Renaming of ProcessQueue() to ProcessBuffer(). Whilst technically more
  accurate, this also provides space for a new ProcessQueue() routine,
  via which the HW_MESSAGE_SIGNALLED_INTERRUPT_ROUTINE and HW_INTERRUPT
  routines can then be consolidated.
* Where necessary, changing BOOLEAN isr to STOR_SPINLOCK LockMode
* CamelCase / conformity correction of MessageID to MessageId
* Conformity correction of QueuNum to QueneNumber
* Removal of unnecessary use of MsgID and MessageID or MessageId in area
  of focus (11 instances of MessageID remain in other vioscsi.c routines)
* Mnemonic rename of:
  * CompletePendingRequests() to CompletePendingRequestsOnReset()
  * index to vq_req_idx
* Syntactic correction of if statement (added braces)

DUPLICATE of PR #1175 to produce clean rebased version.

Signed-off-by: benyamin-codez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants