Xen project Mailing List

[RFC PATCH] Call EvtchnUnmask within the BlkifRing lock

From: "Tu Dinh" <ngoc-tu.dinh@xxxxxxxxxx>

Date: Fri, 27 Mar 2026 09:33:43 +0000

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=mte1 header.d=mandrillapp.com header.i="@mandrillapp.com" header.h="From:Subject:To:Cc:Message-Id:Feedback-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding"; dkim=pass header.s=mte1 header.d=vates.tech header.i="ngoc-tu.dinh@xxxxxxxxxx" header.h="From:Subject:To:Cc:Message-Id:Feedback-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding"

Cc: "Tu Dinh" <ngoc-tu.dinh@xxxxxxxxxx>, "Owen Smith" <owen.smith@xxxxxxxxxx>

Delivery-date: Fri, 27 Mar 2026 09:33:49 +0000

Feedback-id: 30504962:30504962.20260327:md

List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>

The call to EvtchnUnmask accesses Channel outside of the blkif ring lock. Therefore, it can access a stale channel if the DPC is still running after the channel has been closed in BlkifRingDisconnect. Since BlkifRingDisconnect runs at DISPATCH_LEVEL, we cannot use KeFlushQueuedDpcs and have to guard against the event channel's closure via the Enabled flag instead. Signed-off-by: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx> --- Notes: This issue is reliably reproducible by repeatedly suspending/resuming or migrating a VM, which at some point will cause it to lock up when trying to acquire a lock on the event channel. The failure will typically happen within 100 resume cycles. Note the corrupted event channel below: nt!KeBugCheckEx: fffff803`e3f3ad20 48894c2408 mov qword ptr [rsp+8],rcx ss:0018:fffff803`79b9eb50=0000000000000080 0: kd> kP # Child-SP RetAddr Call Site 00 fffff803`79b9eb48 fffff803`e3f832e2 nt!KeBugCheckEx 01 fffff803`79b9eb50 fffff803`e3f7d8d9 nt!HalpNMIHalt+0x2e 02 fffff803`79b9eb90 fffff803`750a129b nt!HalBugCheckSystem+0x69 03 fffff803`79b9ebd0 fffff803`e411f632 PSHED!PshedBugCheckSystem+0xb 04 fffff803`79b9ec00 fffff803`e3f830bb nt!WheaReportHwError+0x2b5752 05 fffff803`79b9ecc0 fffff803`e3ff6daf nt!HalHandleNMI+0x14b 06 fffff803`79b9ecf0 fffff803`e40eeac2 nt!KiProcessNMI+0xff 07 fffff803`79b9ed30 fffff803`e40ee82e nt!KxNmiInterrupt+0x82 08 fffff803`79b9ee70 fffff803`e3c94e6c nt!KiNmiInterrupt+0x26e 09 ffff9600`a38a7640 fffff803`e3c94d5e nt!KxWaitForSpinLockAndAcquire+0x1c 0a ffff9600`a38a7670 fffff803`e4064a62 nt!KeAcquireSpinLockRaiseToDpc+0x5e 0b ffff9600`a38a76a0 fffff803`e45df368 nt!DifKeAcquireSpinLockRaiseToDpcWrapper+0xd2 0c ffff9600`a38a76f0 fffff803`761f59ea nt!VerifierKeAcquireSpinLockRaiseToDpc+0x28 0d ffff9600`a38a7720 fffff803`78a1dcd0 xenbus!EvtchnUnmask( struct _INTERFACE * Interface = <Value unavailable error>, struct _XENBUS_EVTCHN_CHANNEL * Channel = 0xffffbe01`a45fcf80, unsigned char InUpcall = 0x00 '', unsigned char Force = 0x01 '')+0x3a [xenbus\src\xenbus\evtchn.c @ 824] 0e ffff9600`a38a7760 fffff803`e3c95b42 xenvbd!BlkifRingDpc( struct _KDPC * Dpc = 0xfffff803`74b29a00, void * Context = 0xffffbe01`a4116e50, void * Argument1 = 0x00000000`00000000, void * Argument2 = 0x00000000`00000000)+0x3c0 [xenvbd\src\xenvbd\ring.c @ 1627] 0f ffff9600`a38a77d0 fffff803`e4006592 nt!KiExecuteAllDpcs+0x692 10 ffff9600`a38a7a10 fffff803`e3ec7dca nt!KiExecuteDpc+0xc2 11 ffff9600`a38a7bf0 fffff803`e40e3c74 nt!PspSystemThreadStartup+0x5a 12 ffff9600`a38a7c40 00000000`00000000 nt!KiStartSystemThread+0x34 0: kd> dx -id 0,0,ffffbe019f6f5040 -r1 ((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80) ((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80) : 0xffffbe01a45fcf80 [Type: _XENBUS_EVTCHN_CHANNEL *] [+0x000] Magic : 0x28 [Type: unsigned long] [+0x008] Lock : 0xffffffff [Type: unsigned __int64] [+0x010] ListEntry [Type: _LIST_ENTRY] [+0x020] PendingListEntry [Type: _LIST_ENTRY] [+0x030] Pending : -1542361520 [Type: long] [+0x038] Caller : 0xffffbe01a4cb9010 [Type: void *] [+0x040] Callback : 0xffffbe01a4cb9010 : 0xffffbe01a4cb9010 [Type: unsigned char (__cdecl*)(_KINTERRUPT *,void *)] [+0x048] Argument : 0xffffbe01a4cb9010 [Type: void *] [+0x050] Active : 0x65 [Type: unsigned char] [+0x054] Count : 0x0 [Type: unsigned long] [+0x058] Type : -1527206080 [Type: _XENBUS_EVTCHN_TYPE] [+0x05c] Parameters [Type: _XENBUS_EVTCHN_PARAMETERS] [+0x064] Mask : 0x1 [Type: unsigned char] [+0x068] LocalPort : 0x1 [Type: unsigned long] [+0x06c] Cpu : 0x0 [Type: unsigned long] [+0x070] Closed : 0x2f [Type: unsigned char] Xencons may benefit from the same change. This patch will probably hurt performance in some way, but calling EvtchnUnmask within the lock is needed to ensure that the channel is alive. Xencons tries to fix this problem by skipping the unmask call when the ring is not enabled, but Enabled could have gone stale by the time it has reached the unmask, and so KeFlushQueuedDpcs is still needed to chase out any running DPCs before closing the channel. It might be possible to fix this by signaling the end of each DPC with an atomic, then synchronize with that in RingDisconnect, but I haven't investigated whether that'd help performance. src/xenvbd/ring.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/src/xenvbd/ring.c b/src/xenvbd/ring.c index 50f8d58..409d6ab 100644 --- a/src/xenvbd/ring.c +++ b/src/xenvbd/ring.c @@ -1611,19 +1611,27 @@ BlkifRingDpc( KeRaiseIrql(DISPATCH_LEVEL, &Irql); __BlkifRingAcquireLock(BlkifRing); - Retry = BlkifRingPoll(BlkifRing); + + if (BlkifRing->Enabled) { + Retry = BlkifRingPoll(BlkifRing); + + if (!Retry) { + (VOID) XENBUS_EVTCHN(Unmask, + &Ring->EvtchnInterface, + BlkifRing->Channel, + FALSE, + TRUE); + } + } else { + Retry = FALSE; + } + __BlkifRingReleaseLock(BlkifRing); KeLowerIrql(Irql); if (!Retry) break; } - - XENBUS_EVTCHN(Unmask, - &Ring->EvtchnInterface, - BlkifRing->Channel, - FALSE, - TRUE); } #define TIME_US(_us) ((_us) * 10) -- 2.53.0.windows.2 -- Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.