|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock
The call to EvtchnUnmask accesses Channel outside of the blkif ring
lock. Therefore, it can access a stale channel if the DPC is still
running after the channel has been closed in BlkifRingDisconnect. Since
BlkifRingDisconnect runs at DISPATCH_LEVEL, we cannot use
KeFlushQueuedDpcs and have to guard against the event channel's closure
via the Enabled flag instead.
Signed-off-by: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
---
Notes:
This issue is reliably reproducible by repeatedly suspending/resuming or
migrating a VM, which at some point will cause it to lock up when trying
to acquire a lock on the event channel. The failure will typically
happen within 100 resume cycles. Note the corrupted event channel below:
nt!KeBugCheckEx:
fffff803`e3f3ad20 48894c2408 mov qword ptr [rsp+8],rcx
ss:0018:fffff803`79b9eb50=0000000000000080
0: kd> kP
# Child-SP RetAddr Call Site
00 fffff803`79b9eb48 fffff803`e3f832e2 nt!KeBugCheckEx
01 fffff803`79b9eb50 fffff803`e3f7d8d9 nt!HalpNMIHalt+0x2e
02 fffff803`79b9eb90 fffff803`750a129b nt!HalBugCheckSystem+0x69
03 fffff803`79b9ebd0 fffff803`e411f632 PSHED!PshedBugCheckSystem+0xb
04 fffff803`79b9ec00 fffff803`e3f830bb nt!WheaReportHwError+0x2b5752
05 fffff803`79b9ecc0 fffff803`e3ff6daf nt!HalHandleNMI+0x14b
06 fffff803`79b9ecf0 fffff803`e40eeac2 nt!KiProcessNMI+0xff
07 fffff803`79b9ed30 fffff803`e40ee82e nt!KxNmiInterrupt+0x82
08 fffff803`79b9ee70 fffff803`e3c94e6c nt!KiNmiInterrupt+0x26e
09 ffff9600`a38a7640 fffff803`e3c94d5e
nt!KxWaitForSpinLockAndAcquire+0x1c
0a ffff9600`a38a7670 fffff803`e4064a62
nt!KeAcquireSpinLockRaiseToDpc+0x5e
0b ffff9600`a38a76a0 fffff803`e45df368
nt!DifKeAcquireSpinLockRaiseToDpcWrapper+0xd2
0c ffff9600`a38a76f0 fffff803`761f59ea
nt!VerifierKeAcquireSpinLockRaiseToDpc+0x28
0d ffff9600`a38a7720 fffff803`78a1dcd0 xenbus!EvtchnUnmask(
struct _INTERFACE * Interface = <Value unavailable error>,
struct _XENBUS_EVTCHN_CHANNEL * Channel =
0xffffbe01`a45fcf80,
unsigned char InUpcall = 0x00 '',
unsigned char Force = 0x01 '')+0x3a
[xenbus\src\xenbus\evtchn.c @ 824]
0e ffff9600`a38a7760 fffff803`e3c95b42 xenvbd!BlkifRingDpc(
struct _KDPC * Dpc = 0xfffff803`74b29a00,
void * Context = 0xffffbe01`a4116e50,
void * Argument1 = 0x00000000`00000000,
void * Argument2 = 0x00000000`00000000)+0x3c0
[xenvbd\src\xenvbd\ring.c @ 1627]
0f ffff9600`a38a77d0 fffff803`e4006592 nt!KiExecuteAllDpcs+0x692
10 ffff9600`a38a7a10 fffff803`e3ec7dca nt!KiExecuteDpc+0xc2
11 ffff9600`a38a7bf0 fffff803`e40e3c74
nt!PspSystemThreadStartup+0x5a
12 ffff9600`a38a7c40 00000000`00000000 nt!KiStartSystemThread+0x34
0: kd> dx -id 0,0,ffffbe019f6f5040 -r1 ((xenbus!_XENBUS_EVTCHN_CHANNEL
*)0xffffbe01a45fcf80)
((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80) :
0xffffbe01a45fcf80 [Type: _XENBUS_EVTCHN_CHANNEL *]
[+0x000] Magic : 0x28 [Type: unsigned long]
[+0x008] Lock : 0xffffffff [Type: unsigned __int64]
[+0x010] ListEntry [Type: _LIST_ENTRY]
[+0x020] PendingListEntry [Type: _LIST_ENTRY]
[+0x030] Pending : -1542361520 [Type: long]
[+0x038] Caller : 0xffffbe01a4cb9010 [Type: void *]
[+0x040] Callback : 0xffffbe01a4cb9010 : 0xffffbe01a4cb9010
[Type: unsigned char (__cdecl*)(_KINTERRUPT *,void *)]
[+0x048] Argument : 0xffffbe01a4cb9010 [Type: void *]
[+0x050] Active : 0x65 [Type: unsigned char]
[+0x054] Count : 0x0 [Type: unsigned long]
[+0x058] Type : -1527206080 [Type: _XENBUS_EVTCHN_TYPE]
[+0x05c] Parameters [Type: _XENBUS_EVTCHN_PARAMETERS]
[+0x064] Mask : 0x1 [Type: unsigned char]
[+0x068] LocalPort : 0x1 [Type: unsigned long]
[+0x06c] Cpu : 0x0 [Type: unsigned long]
[+0x070] Closed : 0x2f [Type: unsigned char]
Xencons may benefit from the same change.
This patch will probably hurt performance in some way, but calling
EvtchnUnmask within the lock is needed to ensure that the channel is
alive.
Xencons tries to fix this problem by skipping the unmask call when the
ring is not enabled, but Enabled could have gone stale by the time it
has reached the unmask, and so KeFlushQueuedDpcs is still needed to
chase out any running DPCs before closing the channel.
It might be possible to fix this by signaling the end of each DPC with
an atomic, then synchronize with that in RingDisconnect, but I haven't
investigated whether that'd help performance.
src/xenvbd/ring.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/src/xenvbd/ring.c b/src/xenvbd/ring.c
index 50f8d58..409d6ab 100644
--- a/src/xenvbd/ring.c
+++ b/src/xenvbd/ring.c
@@ -1611,19 +1611,27 @@ BlkifRingDpc(
KeRaiseIrql(DISPATCH_LEVEL, &Irql);
__BlkifRingAcquireLock(BlkifRing);
- Retry = BlkifRingPoll(BlkifRing);
+
+ if (BlkifRing->Enabled) {
+ Retry = BlkifRingPoll(BlkifRing);
+
+ if (!Retry) {
+ (VOID) XENBUS_EVTCHN(Unmask,
+ &Ring->EvtchnInterface,
+ BlkifRing->Channel,
+ FALSE,
+ TRUE);
+ }
+ } else {
+ Retry = FALSE;
+ }
+
__BlkifRingReleaseLock(BlkifRing);
KeLowerIrql(Irql);
if (!Retry)
break;
}
-
- XENBUS_EVTCHN(Unmask,
- &Ring->EvtchnInterface,
- BlkifRing->Channel,
- FALSE,
- TRUE);
}
#define TIME_US(_us) ((_us) * 10)
--
2.53.0.windows.2
--
Ngoc Tu Dinh | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |