|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock
On 27/03/2026 15:01, Owen Smith wrote:
>
> BlkifRingPoll already checks that BlkifRing->Enabled is TRUE, but in this
> case the return value would be FALSE, which would need another check
> to protect against Unmask being called on a Channel that has been closed.
>
> Owen
Do you mean something like this?
diff --git a/src/xenvbd/ring.c b/src/xenvbd/ring.c
index 50f8d58..ddf12fc 100644
--- a/src/xenvbd/ring.c
+++ b/src/xenvbd/ring.c
@@ -1211,7 +1211,8 @@ __BlkifRingCompleteResponse(
static FORCEINLINE BOOLEAN
BlkifRingPoll(
- IN PXENVBD_BLKIF_RING BlkifRing
+ IN PXENVBD_BLKIF_RING BlkifRing,
+ IN PBOOLEAN Enabled
)
{
#define XENVBD_BATCH(_Ring) (RING_SIZE(&(_Ring)->Front) / 4)
@@ -1222,8 +1223,12 @@ BlkifRingPoll(
Ring = BlkifRing->Ring;
Retry = FALSE;
- if (!BlkifRing->Enabled)
+ if (BlkifRing->Enabled) {
+ *Enabled = TRUE;
+ } else {
+ *Enabled = FALSE;
goto done;
+ }
for (;;) {
RING_IDX rsp_prod;
@@ -1607,23 +1612,26 @@ BlkifRingDpc(
for (;;) {
BOOLEAN Retry;
+ BOOLEAN Enabled;
KIRQL Irql;
KeRaiseIrql(DISPATCH_LEVEL, &Irql);
__BlkifRingAcquireLock(BlkifRing);
- Retry = BlkifRingPoll(BlkifRing);
+ Retry = BlkifRingPoll(BlkifRing, &Enabled);
+
+ if (Enabled)
+ (VOID) XENBUS_EVTCHN(Unmask,
+ &Ring->EvtchnInterface,
+ BlkifRing->Channel,
+ FALSE,
+ TRUE);
+
__BlkifRingReleaseLock(BlkifRing);
KeLowerIrql(Irql);
if (!Retry)
break;
}
-
- XENBUS_EVTCHN(Unmask,
- &Ring->EvtchnInterface,
- BlkifRing->Channel,
- FALSE,
- TRUE);
}
#define TIME_US(_us) ((_us) * 10)
--
>
> ________________________________________
> From: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
> Sent: 27 March 2026 9:33 AM
> To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Tu Dinh; Owen Smith
> Subject: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock
>
> The call to EvtchnUnmask accesses Channel outside of the blkif ring
> lock. Therefore, it can access a stale channel if the DPC is still
> running after the channel has been closed in BlkifRingDisconnect. Since
> BlkifRingDisconnect runs at DISPATCH_LEVEL, we cannot use
> KeFlushQueuedDpcs and have to guard against the event channel's closure
> via the Enabled flag instead.
>
> Signed-off-by: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
> ---
>
> Notes:
> This issue is reliably reproducible by repeatedly suspending/resuming or
> migrating a VM, which at some point will cause it to lock up when trying
> to acquire a lock on the event channel. The failure will typically
> happen within 100 resume cycles. Note the corrupted event channel below:
>
> nt!KeBugCheckEx:
> fffff803`e3f3ad20 48894c2408 mov qword ptr [rsp+8],rcx
> ss:0018:fffff803`79b9eb50=0000000000000080
> 0: kd> kP
> # Child-SP RetAddr Call Site
> 00 fffff803`79b9eb48 fffff803`e3f832e2 nt!KeBugCheckEx
> 01 fffff803`79b9eb50 fffff803`e3f7d8d9 nt!HalpNMIHalt+0x2e
> 02 fffff803`79b9eb90 fffff803`750a129b nt!HalBugCheckSystem+0x69
> 03 fffff803`79b9ebd0 fffff803`e411f632
> PSHED!PshedBugCheckSystem+0xb
> 04 fffff803`79b9ec00 fffff803`e3f830bb
> nt!WheaReportHwError+0x2b5752
> 05 fffff803`79b9ecc0 fffff803`e3ff6daf nt!HalHandleNMI+0x14b
> 06 fffff803`79b9ecf0 fffff803`e40eeac2 nt!KiProcessNMI+0xff
> 07 fffff803`79b9ed30 fffff803`e40ee82e nt!KxNmiInterrupt+0x82
> 08 fffff803`79b9ee70 fffff803`e3c94e6c nt!KiNmiInterrupt+0x26e
> 09 ffff9600`a38a7640 fffff803`e3c94d5e
> nt!KxWaitForSpinLockAndAcquire+0x1c
> 0a ffff9600`a38a7670 fffff803`e4064a62
> nt!KeAcquireSpinLockRaiseToDpc+0x5e
> 0b ffff9600`a38a76a0 fffff803`e45df368
> nt!DifKeAcquireSpinLockRaiseToDpcWrapper+0xd2
> 0c ffff9600`a38a76f0 fffff803`761f59ea
> nt!VerifierKeAcquireSpinLockRaiseToDpc+0x28
> 0d ffff9600`a38a7720 fffff803`78a1dcd0 xenbus!EvtchnUnmask(
> struct _INTERFACE * Interface = <Value unavailable
> error>,
> struct _XENBUS_EVTCHN_CHANNEL * Channel =
> 0xffffbe01`a45fcf80,
> unsigned char InUpcall = 0x00 '',
> unsigned char Force = 0x01 '')+0x3a
> [xenbus\src\xenbus\evtchn.c @ 824]
> 0e ffff9600`a38a7760 fffff803`e3c95b42 xenvbd!BlkifRingDpc(
> struct _KDPC * Dpc = 0xfffff803`74b29a00,
> void * Context = 0xffffbe01`a4116e50,
> void * Argument1 = 0x00000000`00000000,
> void * Argument2 = 0x00000000`00000000)+0x3c0
> [xenvbd\src\xenvbd\ring.c @ 1627]
> 0f ffff9600`a38a77d0 fffff803`e4006592 nt!KiExecuteAllDpcs+0x692
> 10 ffff9600`a38a7a10 fffff803`e3ec7dca nt!KiExecuteDpc+0xc2
> 11 ffff9600`a38a7bf0 fffff803`e40e3c74
> nt!PspSystemThreadStartup+0x5a
> 12 ffff9600`a38a7c40 00000000`00000000
> nt!KiStartSystemThread+0x34
> 0: kd> dx -id 0,0,ffffbe019f6f5040 -r1
> ((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80)
> ((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80)
> : 0xffffbe01a45fcf80 [Type: _XENBUS_EVTCHN_CHANNEL *]
> [+0x000] Magic : 0x28 [Type: unsigned long]
> [+0x008] Lock : 0xffffffff [Type: unsigned __int64]
> [+0x010] ListEntry [Type: _LIST_ENTRY]
> [+0x020] PendingListEntry [Type: _LIST_ENTRY]
> [+0x030] Pending : -1542361520 [Type: long]
> [+0x038] Caller : 0xffffbe01a4cb9010 [Type: void *]
> [+0x040] Callback : 0xffffbe01a4cb9010 :
> 0xffffbe01a4cb9010 [Type: unsigned char (__cdecl*)(_KINTERRUPT *,void *)]
> [+0x048] Argument : 0xffffbe01a4cb9010 [Type: void *]
> [+0x050] Active : 0x65 [Type: unsigned char]
> [+0x054] Count : 0x0 [Type: unsigned long]
> [+0x058] Type : -1527206080 [Type:
> _XENBUS_EVTCHN_TYPE]
> [+0x05c] Parameters [Type: _XENBUS_EVTCHN_PARAMETERS]
> [+0x064] Mask : 0x1 [Type: unsigned char]
> [+0x068] LocalPort : 0x1 [Type: unsigned long]
> [+0x06c] Cpu : 0x0 [Type: unsigned long]
> [+0x070] Closed : 0x2f [Type: unsigned char]
>
> Xencons may benefit from the same change.
>
> This patch will probably hurt performance in some way, but calling
> EvtchnUnmask within the lock is needed to ensure that the channel is
> alive.
>
> Xencons tries to fix this problem by skipping the unmask call when the
> ring is not enabled, but Enabled could have gone stale by the time it
> has reached the unmask, and so KeFlushQueuedDpcs is still needed to
> chase out any running DPCs before closing the channel.
>
> It might be possible to fix this by signaling the end of each DPC with
> an atomic, then synchronize with that in RingDisconnect, but I haven't
> investigated whether that'd help performance.
>
> src/xenvbd/ring.c | 22 +++++++++++++++-------
> 1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/src/xenvbd/ring.c b/src/xenvbd/ring.c
> index 50f8d58..409d6ab 100644
> --- a/src/xenvbd/ring.c
> +++ b/src/xenvbd/ring.c
> @@ -1611,19 +1611,27 @@ BlkifRingDpc(
>
> KeRaiseIrql(DISPATCH_LEVEL, &Irql);
> __BlkifRingAcquireLock(BlkifRing);
> - Retry = BlkifRingPoll(BlkifRing);
> +
> + if (BlkifRing->Enabled) {
> + Retry = BlkifRingPoll(BlkifRing);
> +
> + if (!Retry) {
> + (VOID) XENBUS_EVTCHN(Unmask,
> + &Ring->EvtchnInterface,
> + BlkifRing->Channel,
> + FALSE,
> + TRUE);
> + }
> + } else {
> + Retry = FALSE;
> + }
> +
> __BlkifRingReleaseLock(BlkifRing);
> KeLowerIrql(Irql);
>
> if (!Retry)
> break;
> }
> -
> - XENBUS_EVTCHN(Unmask,
> - &Ring->EvtchnInterface,
> - BlkifRing->Channel,
> - FALSE,
> - TRUE);
> }
>
> #define TIME_US(_us) ((_us) * 10)
> --
> 2.53.0.windows.2
>
>
>
> --
> Ngoc Tu Dinh | Vates XCP-ng Developer
>
> XCP-ng & Xen Orchestra - Vates solutions
>
> web: https://vates.tech
>
--
Ngoc Tu Dinh | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |