[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()

Hello Julien,

This patch is very interesting to me.

On 23.10.18 21:17, Julien Grall wrote:
> Devices that expose their interrupt status registers via system
> registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
> vgic (although unused by Linux), ...)
I guess under vgic you mean GICH registers, is it correct?
I would speculate that GIC v2 registers are memory mapped, not exposed 
via system registers.

>   rely on a context synchronising
> operation on the CPU to ensure that the updated status register is
> visible to the CPU when handling the interrupt. This usually happens as
> a result of taking the IRQ exception in the first place, but there are
> two race scenarios where this isn't the case.
> For example, let's say we have two peripherals (X and Y), where Y uses a
> system register for its interrupt status.
> Case 1:
> 1. CPU takes an IRQ exception as a result of X raising an interrupt
> 2. Y then raises its interrupt line, but the update to its system
>     register is not yet visible to the CPU
> 3. The GIC decides to expose Y's interrupt number first in the Ack
>     register
> 4. The CPU runs the IRQ handler for Y, but the status register is stale
But this scenario somehow explains a strange thing I saw during IRQ 
latency investigation (optimization attempts) during last weeks.
The strange sequence is as following:
     1. CPU takes an IRQ exception from the guest context
     2. In `enter_hypervisor_head()` function (i.e. in `gic_clear_lrs()` 
as for mainline) from some LR registers interrupt statuses are read as 
     3. Performing further code, without returning to the guest (i.e. 
inside vgic_vcpu_inject_irq()), it happens that we read status INVALID 
from the LR we read PENDING before, in step 2.

Please note that we are tailoring xen based on RELEASE-4.10.0.

> Case 2:
> 1. CPU takes an IRQ exception as a result of X raising an interrupt
> 2. CPU reads the interrupt number for X from the Ack register and runs
>     its IRQ handler
> 3. Y raises its interrupt line and the Ack register is updated, but
>     again, the update to its system register is not yet visible to the
>     CPU.
> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt
>     number and run its handler without a context synchronisation
>     operation, therefore seeing the stale register value.
> In either case, we run the risk of missing an IRQ. This patch solves the
> problem by ensuring that we execute an ISB in the GIC drivers prior
> to invoking the interrupt handler.
> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".
> Signed-off-by: Julien Grall <julien.grall@xxxxxxx>
> ---
>      This patch is a candidate for backporting up to Xen 4.9.
> ---
>   xen/arch/arm/gic.c | 2 ++
>   1 file changed, 2 insertions(+)
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 8d7e491060..305fbd66dd 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int 
> is_fiq)
>           if ( likely(irq >= 16 && irq < 1020) )
>           {
>               local_irq_enable();
> +            isb();
Taking in account that the first GICH accesses are from 
`gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest 
moving isb() there.

>               do_IRQ(regs, irq, is_fiq);
>               local_irq_disable();
>           }
>           else if ( is_lpi(irq) )
>           {
>               local_irq_enable();
> +            isb();
>               gic_hw_ops->do_LPI(irq);
>               local_irq_disable();
>           }


*Andrii Anisov*

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.