Xen project Mailing List

Re: [Xen-devel] [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Mon, 26 Mar 2018 17:21:54 -0700 (PDT)

Authentication-results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org

Authentication-results: mail.kernel.org; spf=none smtp.mailfrom=sstabellini@xxxxxxxxxx

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, Andre Przywara <andre.przywara@xxxxxxxxxx>

Delivery-date: Tue, 27 Mar 2018 00:22:02 +0000

Dmarc-filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C4CB2175B

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, 26 Mar 2018, Stefano Stabellini wrote: > On Thu, 22 Mar 2018, Andre Przywara wrote: > > Processing maintenance interrupts and accessing the list registers > > are dependent on the host's GIC version. > > Introduce vgic-v2.c to contain GICv2 specific functions. > > Implement the GICv2 specific code for syncing the emulation state > > into the VGIC registers. > > This also adds the hook to let Xen setup the host GIC addresses. > > > > This is based on Linux commit 140b086dd197, written by Marc Zyngier. > > > > Signed-off-by: Andre Przywara <andre.przywara@xxxxxxxxxx> > > --- > > Changelog v3 ... v3a: > > - take hardware IRQ lock in vgic_v2_fold_lr_state() > > - fix last remaining u32 usage > > - print message when using new VGIC > > - add TODO about racy _IRQ_INPROGRESS setting > > > > Changelog v2 ... v3: > > - remove no longer needed asm/io.h header > > - replace 0/1 with false/true for bool's > > - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ > > - fix indentation and w/s issues > > > > Changelog v1 ... v2: > > - remove v2 specific underflow function (now generic) > > - re-add Linux code to properly handle acked level IRQs > > > > xen/arch/arm/vgic/vgic-v2.c | 259 > > ++++++++++++++++++++++++++++++++++++++++++++ > > xen/arch/arm/vgic/vgic.c | 6 + > > xen/arch/arm/vgic/vgic.h | 9 ++ > > 3 files changed, 274 insertions(+) > > create mode 100644 xen/arch/arm/vgic/vgic-v2.c > > > > diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c > > new file mode 100644 > > index 0000000000..1773503cfb > > --- /dev/null > > +++ b/xen/arch/arm/vgic/vgic-v2.c > > @@ -0,0 +1,259 @@ > > +/* > > + * Copyright (C) 2015, 2016 ARM Ltd. > > + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen. > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License version 2 as > > + * published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program. If not, see <http://www.gnu.org/licenses/>. > > + */ > > + > > +#include <asm/new_vgic.h> > > +#include <asm/bug.h> > > +#include <asm/gic.h> > > +#include <xen/sched.h> > > +#include <xen/sizes.h> > > + > > +#include "vgic.h" > > + > > +static struct { > > + bool enabled; > > + paddr_t dbase; /* Distributor interface address */ > > + paddr_t cbase; /* CPU interface address & size */ > > + paddr_t csize; > > + paddr_t vbase; /* Virtual CPU interface address */ > > + > > + /* Offset to add to get an 8kB contiguous region if GIC is aliased */ > > + uint32_t aliased_offset; > > +} gic_v2_hw_data; > > + > > +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize, > > + paddr_t vbase, uint32_t aliased_offset) > > +{ > > + gic_v2_hw_data.enabled = true; > > + gic_v2_hw_data.dbase = dbase; > > + gic_v2_hw_data.cbase = cbase; > > + gic_v2_hw_data.csize = csize; > > + gic_v2_hw_data.vbase = vbase; > > + gic_v2_hw_data.aliased_offset = aliased_offset; > > + > > + printk("Using the new VGIC implementation.\n"); > > +} > > + > > +/* > > + * transfer the content of the LRs back into the corresponding ap_list: > > + * - active bit is transferred as is > > + * - pending bit is > > + * - transferred as is in case of edge sensitive IRQs > > + * - set to the line-level (resample time) for level sensitive IRQs > > + */ > > +void vgic_v2_fold_lr_state(struct vcpu *vcpu) > > +{ > > + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic; > > + unsigned int used_lrs = vcpu->arch.vgic.used_lrs; > > + unsigned long flags; > > + unsigned int lr; > > + > > + if ( !used_lrs ) /* No LRs used, so nothing to sync back here. */ > > + return; > > + > > + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false); > > + > > + for ( lr = 0; lr < used_lrs; lr++ ) > > + { > > + struct gic_lr lr_val; > > + uint32_t intid; > > + struct vgic_irq *irq; > > + struct irq_desc *desc = NULL; > > + bool have_desc_lock = false; > > + > > + gic_hw_ops->read_lr(lr, &lr_val); > > + > > + /* > > + * TODO: Possible optimization to avoid reading LRs: > > + * Read the ELRSR to find out which of our LRs have been cleared > > + * by the guest. We just need to know the IRQ number for those, > > which > > + * we could save in an array when populating the LRs. > > + * This trades one MMIO access (ELRSR) for possibly more than one > > (LRs), > > + * but requires some more code to save the IRQ number and to handle > > + * those finished IRQs according to the algorithm below. > > + * We need some numbers to justify this: chances are that we don't > > + * have many LRs in use most of the time, so we might not save > > much. > > + */ > > + gic_hw_ops->clear_lr(lr); > > + > > + intid = lr_val.virq; > > + irq = vgic_get_irq(vcpu->domain, vcpu, intid); > > + > > + local_irq_save(flags); > > Shouldn't we disable interrupts earlier, maybe at the beginning of the > function? Is it not a problem if we take an interrupt a couple of lines > above with the read_lr and clear_lr that we do? > > > > + spin_lock(&irq->irq_lock); > > + > > + /* The locking order forces us to drop and re-take the locks here. > > */ > > + if ( irq->hw ) > > + { > > + spin_unlock(&irq->irq_lock); > > + > > + desc = irq_to_desc(irq->hwintid); > > + spin_lock(&desc->lock); > > + spin_lock(&irq->irq_lock); > > + > > + /* This h/w IRQ should still be assigned to the virtual IRQ. */ > > + ASSERT(irq->hw && desc->irq == irq->hwintid); > > + > > + have_desc_lock = true; > > + } > > I agree with Julien that this looks very fragile. Instead, I think it > would be best to always take the desc lock (if irq->hw) before the > irq_lock earlier in this function. That way, we don't have to deal with > this business of unlocking and relocking. Do you see any problems with > it? We don't change irq->hw at run time, so it looks OK to me. > > > > + /* > > + * If a hardware mapped IRQ has been handled for good, we need to > > + * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs. > > + * > > + * TODO: This is probably racy, but is so already in the existing > > + * VGIC. A fix does not seem to be trivial. > > + */ > > + if ( irq->hw && !lr_val.active && !lr_val.pending ) > > + clear_bit(_IRQ_INPROGRESS, &desc->status); > > I'll reply here to Julien's comment: > > > I realize the current vGIC is doing exactly the same thing. But this is > > racy. > > > > Imagine the interrupt is firing on another pCPU (I wasn't able to rule out > > this even when the interrupt is following the vCPU), that pCPU may set > > _IRQ_INPROGRESS before this > > is cleared here. > > The assumption in the old vgic was that this scenario was not possible. > vgic_migrate_irq would avoid changing physical interrupt affinity if a > virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298). > Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the > time of clearing the LR we would change the physical irq affinity (see > xen/arch/arm/gic-vgic.c:L240). > > I think we would need a similar mechanism here to protect ourselves from > races. Is there something equivalent in the new vgic? After reading the following patches, I am thinking that to fix the race we need to make sure that irq->vcpu is set to NULL here, and we need to set it to NULL with the desc->lock held. Let's supposed that the irq is INPROGRESS on cpu0/vcpu0, and we get the same interrupt on cpu1/vcpu1 because ITARGETSR has been changed. vgic_queue_irq_unlock simply drops the interrupt because irq->vcpu != NULL (also keep in mind that at that point desc->lock is held). Maybe it isn't nice but it shouldn't be racy. When cpu0/vcpu0 clears INPROGRESS in vgic_v2_fold_lr_state, it is not a problem because the interrupt was never injected in cpu1/vcpu1. Vice versa, if cpu0/vcpu0 clears INPROGRESS and sets irq->vcpu = NULL in vgic_v2_fold_lr_state before the new interrupt is delivered to cpu1/vcpu1, then the interrupt will be injected to cpu1/vcpu1 and INPROGRESS is set again correctly. In other words, as long as irq->vcpu != NULL and INPROGRESS are kept in sync, the race is avoided. With this patch, the race exists because INPROGRESS could be set on cpu1/vcpu1 while cpu0/vcpu0 clears sets vcpu to NULL. Does it make sense? Do you agree with this analysis? The fix, although it looks pretty small, could be sent separately after the code freeze. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.