[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock


  • To: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>, "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Owen Smith <owen.smith@xxxxxxxxxx>
  • Date: Fri, 27 Mar 2026 15:26:11 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bOOcPzZ5T/zjQNQsVawxWY1VvPc0ioW92i2L4g+IbVc=; b=YTn8K98R/GfS2nL1rDLYGeYJDvxBv6RGlJ3SZbiRyGGkCftZatfCgWFjH7ktyxnFYJWtyD43QDRRKbEmJnq6Wt5e9Im5yeY1H238rBDh61gDRexU8dPluPBK2yRww0ZAm953thakSZXuJuGuOjdSoKSkn1CF5ndcIM71PBANsuyqAWAxk+KjbKleXoIlI6nNj8s4C/0Ta9PjqtwC3dtQJa9Pc0eVBJKXFsMLBhQwWk3Y3PnembXwbXAmVGdgHg0ng9YejDBHLB9SV6w+DODQJbvs+zBUKYbDjeNK7Dc/a0XCHj97bohahqlX5OCsNrjKCobDqc4Ux17tdm06hf5P2Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Mi2vpg3xmtL0UEKmgXD1NmemT3FvjY2VMwKeiB6OB9egnCZzVtuYx5XVfaUc95FslVIJk7ixfp1ZW9oNoTX3pGiypp6E27seVjhd5Z/KopAJCjfDjJOz+dbOEFYPOo6e9/ggblwZKWE9ZA7VXRQ3RHGLU5WSFUJHWKAZ8S8b9b0cMtV8Oyta9WxlGYqAEWX4KoTMXUPCytF1Cm/0iMxRS8CvIQ5Lv4aM7i+2/+tawTFvCXY0LKV7vl5Wgc7Hi/H64zeaUAlzGrDjU/RaOfrKT37aq/73v87OXIdjDxzg/EwX3ofpgH9lb1jVngEoJxWGIm1IAk8FbYLw+thoXhPq6Q==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:x-ms-exchange-senderadcheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Delivery-date: Fri, 27 Mar 2026 15:26:20 +0000
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>
  • Msip_labels:
  • Thread-index: AQHcvczPQtA3BJsBtk6irODhNC6JUbXCZuLbgAAUcoCAAAFMNoAAAnWAgAABPoQ=
  • Thread-topic: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock

Yes, looking back at the code again,

        Retry = BlkifRingPoll(BlkifRing);
        if (!Retry && BlkifRing->Enabled) {
                (VOID) XENBUS_EVTCHN(Unmask,
                                     &Ring->EvtchnInterface,
                                     BlkifRing->Channel,
                                     FALSE,
                                     TRUE);
        }

would be needed


Owen

________________________________________
From: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
Sent: 27 March 2026 3:21 PM
To: Owen Smith; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock

On 27/03/2026 16:14, Owen Smith wrote:
> It may not need changes to BlkifRingPoll
>
> Retry = BlkifRingPoll(BlkifRing);
> if (BlkifRing->Enabled)
>      (VOID) XENBUS_EVTCHN(Unmask,.....)
>

Oh, as in flipping the condition. It should still require `if (!Retry &&
BlkifRing->Enabled)` to avoid unmasking in the middle of a retry cycle
right?

>
> Owen
>
> ________________________________________
> From: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
> Sent: Friday, March 27, 2026 3:07 PM
> To: Owen Smith <owen.smith@xxxxxxxxxx>; win-pv-devel@xxxxxxxxxxxxxxxxxxxx 
> <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock
>
>
> On 27/03/2026 15:01, Owen Smith wrote:
>>
>> BlkifRingPoll already checks that BlkifRing->Enabled is TRUE, but in this
>> case the return value would be FALSE, which would need another check
>> to protect against Unmask being called on a Channel that has been closed.
>>
>> Owen
>
> Do you mean something like this?
>
> diff --git a/src/xenvbd/ring.c b/src/xenvbd/ring.c
> index 50f8d58..ddf12fc 100644
> --- a/src/xenvbd/ring.c
> +++ b/src/xenvbd/ring.c
> @@ -1211,7 +1211,8 @@ __BlkifRingCompleteResponse(
>
>    static FORCEINLINE BOOLEAN
>    BlkifRingPoll(
> -    IN  PXENVBD_BLKIF_RING  BlkifRing
> +    IN  PXENVBD_BLKIF_RING  BlkifRing,
> +    IN  PBOOLEAN            Enabled
>        )
>    {
>    #define XENVBD_BATCH(_Ring) (RING_SIZE(&(_Ring)->Front) / 4)
> @@ -1222,8 +1223,12 @@ BlkifRingPoll(
>        Ring = BlkifRing->Ring;
>        Retry = FALSE;
>
> -    if (!BlkifRing->Enabled)
> +    if (BlkifRing->Enabled) {
> +        *Enabled = TRUE;
> +    } else {
> +        *Enabled = FALSE;
>            goto done;
> +    }
>
>        for (;;) {
>            RING_IDX            rsp_prod;
> @@ -1607,23 +1612,26 @@ BlkifRingDpc(
>
>        for (;;) {
>            BOOLEAN         Retry;
> +        BOOLEAN         Enabled;
>            KIRQL           Irql;
>
>            KeRaiseIrql(DISPATCH_LEVEL, &Irql);
>            __BlkifRingAcquireLock(BlkifRing);
> -        Retry = BlkifRingPoll(BlkifRing);
> +        Retry = BlkifRingPoll(BlkifRing, &Enabled);
> +
> +        if (Enabled)
> +            (VOID) XENBUS_EVTCHN(Unmask,
> +                                 &Ring->EvtchnInterface,
> +                                 BlkifRing->Channel,
> +                                 FALSE,
> +                                 TRUE);
> +
>            __BlkifRingReleaseLock(BlkifRing);
>            KeLowerIrql(Irql);
>
>            if (!Retry)
>                break;
>        }
> -
> -    XENBUS_EVTCHN(Unmask,
> -                  &Ring->EvtchnInterface,
> -                  BlkifRing->Channel,
> -                  FALSE,
> -                  TRUE);
>    }
>
>    #define TIME_US(_us)        ((_us) * 10)
> --
>
>>
>> ________________________________________
>> From: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
>> Sent: 27 March 2026 9:33 AM
>> To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
>> Cc: Tu Dinh; Owen Smith
>> Subject: [RFC PATCH] Call EvtchnUnmask within the BlkifRing lock
>>
>> The call to EvtchnUnmask accesses Channel outside of the blkif ring
>> lock. Therefore, it can access a stale channel if the DPC is still
>> running after the channel has been closed in BlkifRingDisconnect. Since
>> BlkifRingDisconnect runs at DISPATCH_LEVEL, we cannot use
>> KeFlushQueuedDpcs and have to guard against the event channel's closure
>> via the Enabled flag instead.
>>
>> Signed-off-by: Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx>
>> ---
>>
>> Notes:
>>        This issue is reliably reproducible by repeatedly suspending/resuming 
>> or
>>        migrating a VM, which at some point will cause it to lock up when 
>> trying
>>        to acquire a lock on the event channel. The failure will typically
>>        happen within 100 resume cycles. Note the corrupted event channel 
>> below:
>>
>>            nt!KeBugCheckEx:
>>            fffff803`e3f3ad20 48894c2408      mov     qword ptr [rsp+8],rcx 
>> ss:0018:fffff803`79b9eb50=0000000000000080
>>            0: kd> kP
>>             # Child-SP          RetAddr               Call Site
>>            00 fffff803`79b9eb48 fffff803`e3f832e2     nt!KeBugCheckEx
>>            01 fffff803`79b9eb50 fffff803`e3f7d8d9     nt!HalpNMIHalt+0x2e
>>            02 fffff803`79b9eb90 fffff803`750a129b     
>> nt!HalBugCheckSystem+0x69
>>            03 fffff803`79b9ebd0 fffff803`e411f632     
>> PSHED!PshedBugCheckSystem+0xb
>>            04 fffff803`79b9ec00 fffff803`e3f830bb     
>> nt!WheaReportHwError+0x2b5752
>>            05 fffff803`79b9ecc0 fffff803`e3ff6daf     nt!HalHandleNMI+0x14b
>>            06 fffff803`79b9ecf0 fffff803`e40eeac2     nt!KiProcessNMI+0xff
>>            07 fffff803`79b9ed30 fffff803`e40ee82e     nt!KxNmiInterrupt+0x82
>>            08 fffff803`79b9ee70 fffff803`e3c94e6c     nt!KiNmiInterrupt+0x26e
>>            09 ffff9600`a38a7640 fffff803`e3c94d5e     
>> nt!KxWaitForSpinLockAndAcquire+0x1c
>>            0a ffff9600`a38a7670 fffff803`e4064a62     
>> nt!KeAcquireSpinLockRaiseToDpc+0x5e
>>            0b ffff9600`a38a76a0 fffff803`e45df368     
>> nt!DifKeAcquireSpinLockRaiseToDpcWrapper+0xd2
>>            0c ffff9600`a38a76f0 fffff803`761f59ea     
>> nt!VerifierKeAcquireSpinLockRaiseToDpc+0x28
>>            0d ffff9600`a38a7720 fffff803`78a1dcd0     xenbus!EvtchnUnmask(
>>                        struct _INTERFACE * Interface = <Value unavailable 
>> error>,
>>                        struct _XENBUS_EVTCHN_CHANNEL * Channel = 
>> 0xffffbe01`a45fcf80,
>>                        unsigned char InUpcall = 0x00 '',
>>                        unsigned char Force = 0x01 '')+0x3a 
>> [xenbus\src\xenbus\evtchn.c @ 824]
>>            0e ffff9600`a38a7760 fffff803`e3c95b42     xenvbd!BlkifRingDpc(
>>                        struct _KDPC * Dpc = 0xfffff803`74b29a00,
>>                        void * Context = 0xffffbe01`a4116e50,
>>                        void * Argument1 = 0x00000000`00000000,
>>                        void * Argument2 = 0x00000000`00000000)+0x3c0 
>> [xenvbd\src\xenvbd\ring.c @ 1627]
>>            0f ffff9600`a38a77d0 fffff803`e4006592     
>> nt!KiExecuteAllDpcs+0x692
>>            10 ffff9600`a38a7a10 fffff803`e3ec7dca     nt!KiExecuteDpc+0xc2
>>            11 ffff9600`a38a7bf0 fffff803`e40e3c74     
>> nt!PspSystemThreadStartup+0x5a
>>            12 ffff9600`a38a7c40 00000000`00000000     
>> nt!KiStartSystemThread+0x34
>>            0: kd> dx -id 0,0,ffffbe019f6f5040 -r1 
>> ((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80)
>>            ((xenbus!_XENBUS_EVTCHN_CHANNEL *)0xffffbe01a45fcf80)             
>>     : 0xffffbe01a45fcf80 [Type: _XENBUS_EVTCHN_CHANNEL *]
>>                [+0x000] Magic            : 0x28 [Type: unsigned long]
>>                [+0x008] Lock             : 0xffffffff [Type: unsigned 
>> __int64]
>>                [+0x010] ListEntry        [Type: _LIST_ENTRY]
>>                [+0x020] PendingListEntry [Type: _LIST_ENTRY]
>>                [+0x030] Pending          : -1542361520 [Type: long]
>>                [+0x038] Caller           : 0xffffbe01a4cb9010 [Type: void *]
>>                [+0x040] Callback         : 0xffffbe01a4cb9010 : 
>> 0xffffbe01a4cb9010 [Type: unsigned char (__cdecl*)(_KINTERRUPT *,void *)]
>>                [+0x048] Argument         : 0xffffbe01a4cb9010 [Type: void *]
>>                [+0x050] Active           : 0x65 [Type: unsigned char]
>>                [+0x054] Count            : 0x0 [Type: unsigned long]
>>                [+0x058] Type             : -1527206080 [Type: 
>> _XENBUS_EVTCHN_TYPE]
>>                [+0x05c] Parameters       [Type: _XENBUS_EVTCHN_PARAMETERS]
>>                [+0x064] Mask             : 0x1 [Type: unsigned char]
>>                [+0x068] LocalPort        : 0x1 [Type: unsigned long]
>>                [+0x06c] Cpu              : 0x0 [Type: unsigned long]
>>                [+0x070] Closed           : 0x2f [Type: unsigned char]
>>
>>        Xencons may benefit from the same change.
>>
>>        This patch will probably hurt performance in some way, but calling
>>        EvtchnUnmask within the lock is needed to ensure that the channel is
>>        alive.
>>
>>        Xencons tries to fix this problem by skipping the unmask call when the
>>        ring is not enabled, but Enabled could have gone stale by the time it
>>        has reached the unmask, and so KeFlushQueuedDpcs is still needed to
>>        chase out any running DPCs before closing the channel.
>>
>>        It might be possible to fix this by signaling the end of each DPC with
>>        an atomic, then synchronize with that in RingDisconnect, but I haven't
>>        investigated whether that'd help performance.
>>
>>     src/xenvbd/ring.c | 22 +++++++++++++++-------
>>     1 file changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/xenvbd/ring.c b/src/xenvbd/ring.c
>> index 50f8d58..409d6ab 100644
>> --- a/src/xenvbd/ring.c
>> +++ b/src/xenvbd/ring.c
>> @@ -1611,19 +1611,27 @@ BlkifRingDpc(
>>
>>             KeRaiseIrql(DISPATCH_LEVEL, &Irql);
>>             __BlkifRingAcquireLock(BlkifRing);
>> -        Retry = BlkifRingPoll(BlkifRing);
>> +
>> +        if (BlkifRing->Enabled) {
>> +            Retry = BlkifRingPoll(BlkifRing);
>> +
>> +            if (!Retry) {
>> +                (VOID) XENBUS_EVTCHN(Unmask,
>> +                                     &Ring->EvtchnInterface,
>> +                                     BlkifRing->Channel,
>> +                                     FALSE,
>> +                                     TRUE);
>> +            }
>> +        } else {
>> +            Retry = FALSE;
>> +        }
>> +
>>             __BlkifRingReleaseLock(BlkifRing);
>>             KeLowerIrql(Irql);
>>
>>             if (!Retry)
>>                 break;
>>         }
>> -
>> -    XENBUS_EVTCHN(Unmask,
>> -                  &Ring->EvtchnInterface,
>> -                  BlkifRing->Channel,
>> -                  FALSE,
>> -                  TRUE);
>>     }
>>
>>     #define TIME_US(_us)        ((_us) * 10)
>> --
>> 2.53.0.windows.2
>>
>>
>>
>> --
>> Ngoc Tu Dinh | Vates XCP-ng Developer
>>
>> XCP-ng & Xen Orchestra - Vates solutions
>>
>> web: https://vates.tech
>>
>
>
>
> --
> Ngoc Tu Dinh | Vates XCP-ng Developer
>
> XCP-ng & Xen Orchestra - Vates solutions
>
> web: https://vates.tech
>
>
>



--
Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.