[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug



On Mon, 19 Dec 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 17/12/2016 02:13, Stefano Stabellini wrote:
> > On Fri, 16 Dec 2016, Stefano Stabellini wrote:
> > > On Fri, 16 Dec 2016, Julien Grall wrote:
> > > > Hi Stefano,
> > > > 
> > > > On 16/12/16 00:08, Stefano Stabellini wrote:
> > > > > On Thu, 15 Dec 2016, Julien Grall wrote:
> > > > > > On 15/12/2016 01:04, Stefano Stabellini wrote:
> > > > > > > The locking order is: first rank lock, then vgic lock. The order
> > > > > > > is
> > > > > > > respected everywhere, except for gic_update_one_lr.
> > > > > > > 
> > > > > > > gic_update_one_lr is called with the vgic lock held, but it calls
> > > > > > > vgic_get_target_vcpu, which tries to obtain the rank lock. This
> > > > > > > can
> > > > > > > cause deadlocks.
> > > > > > > 
> > > > > > > We already have a version of vgic_get_target_vcpu that doesn't
> > > > > > > take the
> > > > > > > rank lock: __vgic_get_target_vcpu. Also the only routine that
> > > > > > > modify
> > > > > > > the target vcpu are vgic_store_itargetsr and vgic_store_irouter.
> > > > > > > They
> > > > > > > call vgic_migrate_irq, which already take the vgic lock.
> > > > > > > 
> > > > > > > Solve the lock inversion problem, by not taking the rank lock in
> > > > > > > gic_update_one_lr (calling __vgic_get_target_vcpu instead of
> > > > > > > vgic_get_target_vcpu).
> > > > > > 
> > > > > > If I look at the callers of gic_update_one_lr, the function
> > > > > > gic_clear_lrs
> > > > > > will
> > > > > > not take the rank lock. So from my understanding nobody will take
> > > > > > the rank
> > > > > > here.
> > > > > > 
> > > > > > However __vgic_get_target_vcpu has an ASSERT to check whether the
> > > > > > rank
> > > > > > lock
> > > > > > has been taken. So who is taking lock for gic_update_one_lr now?
> > > > > 
> > > > > I should have removed the ASSERT - nobody is taking the rank lock now
> > > > > on
> > > > > the gic_update_one_lr code path.
> > > > 
> > > > Please either keep this ASSERT or update it. But not dropping it, we
> > > > need to
> > > > prevent anyone calling this function without any lock taken at least in
> > > > debug
> > > > build.
> > > > 
> > > > The current callers (see vgic_{disable,enable}_irqs) are calling the
> > > > function
> > > > with rank lock taken.
> > > > 
> > > > So we need to keep this behavior at least for them.
> > > > 
> > > > > >  We make this safe, by placing modifications to
> > > > > > > rank->vcpu within regions protected by the vgic lock.
> > > > > > 
> > > > > > This look unsafe to me. The vgic lock is per-vcpu, but rank->vcpu
> > > > > > could be
> > > > > > written/read by any vCPU. So you will never protect rank->vcpu with
> > > > > > this
> > > > > > lock.
> > > > > > Did I miss anything?
> > > > > 
> > > > > The vgic lock is per-vcpu, but it is always the vgic lock of the old
> > > > > (old
> > > > > in the sense, before the interrupt is migrated) vcpu that is taken.
> > > > > 
> > > > > On one hand any vcpu can read/write to itargetsr. Those operations are
> > > > > protected by the rank lock. However vgic_migrate_irq and writes to
> > > > > rank->vcpu are protected also by the vgic lock of the old vcpu (first
> > > > > the rank lock is taken, then the vgic lock of the old vcpu).
> > > > 
> > > > I don't think this is true. Any vCPU read/write to itargetsr will be
> > > > protected
> > > > by the vgic lock of the vCPU in rank->vcpu[offset].
> > > > 
> > > > For inflight IRQ, the migration will happen when the guest has EOIed the
> > > > IRQ.
> > > > Until this is happening, the guest may have written multiple time in
> > > > ITARGETSR.
> > > > 
> > > > For instance, let say the guest is requesting to migrate the IRQ from
> > > > vCPU A
> > > > to vCPU B. The vgic lock A will be taken in vgic_store_itargetsr, if the
> > > > interrupt is inflight vgic_migrate_irq will set a flag. Now the guest
> > > > could
> > > > decide to migrate the interrupt to vCPU C before the interrupt has been
> > > > EOIed.
> > > > In this case, vgic lock B will be taken.
> > > > 
> > > > So we have a mismatch between the lock taken in gic_update_one_lr and
> > > > vgic_migrate_irq.
> > > > 
> > > > I think the only safe way to fully protect rank->vcpu is taking the rank
> > > > lock.
> > > > It will never change and be accessible from anywhere.
> > > 
> > > When I submitted this series, I considered this scenario, and I thought
> > > that the existing GIC_IRQ_GUEST_MIGRATING flag should be enough to
> > > protect it, because when GIC_IRQ_GUEST_MIGRATING is set,
> > > vgic_migrate_irq actually returns immediately, without doing anything.
> > > Either GIC_IRQ_GUEST_MIGRATING is not set, and it's the first irq
> > > migration, or GIC_IRQ_GUEST_MIGRATING is set, and vgic_migrate_irq just
> > > returns.
> > > 
> > > However, as I was writing this explanation to you in more details, I
> > > realized that it is not actually safe. Concurrent accesses to
> > > rank->vcpu, and concurrent calls to irq_set_affinity, can result in the
> > > physical affinity actually set to the intermediate pcpu rather than the
> > > final. As a matter of fact, one of these races affect the code today,
> > > even without this patch series.
> 
> Because irq_set_affinity is not called protected by the same lock (e.g rank or
> vgic), right?

Yes. This is an example of failure that affects the code as it is today:

  CPU0                                  CPU1
  <GIC_IRQ_GUEST_MIGRATING is set>      <GIC_IRQ_GUEST_MIGRATING is set>
  ----------------------------------------------------------------------
                                        vgic_migrate_irq (does nothing)
  clear GIC_IRQ_GUEST_MIGRATING
  read rank->vcpu (intermediate)
                                        set rank->vcpu (final)
  irq_set_affinity (intermediate)


Result: the irq physical affinity is the one of the intermediate vcpu,
not the final vcpu.



> [...]
> 
> > > 2) We run gic_update_one_lr and vgic_store_itargetsr in parallel safely
> > > and locklessly. There might be a way to do it, but it is not easy I
> > > haven't found it yet.
> 
> Correct me if I am wrong. There are no restriction to write into
> IROUTER/ITARGETSR while an IRQ is pending. So the irq_set_affinity could be
> called once at the beginning of vgic_irq_migrate.
> 
> We may receive the interrupt on the wrong physical CPU at the beginning. But
> it would be the same when writing into IROUTER/ITARGETSR.
> 
> This would remove the need to get the rank lock in gic_update_one_lr.
> 
> Did I miss anything?

I am not sure if we can do that: the virtual interrupt might not be
EOI'd yet at that time. The guest EOI is used to deactivate the
corresponding physical interrupt. Migrating the physical IRQ at that
time, could have unintended consequences? I am not sure what the spec
says about this, but even if it is correct on paper, I would prefer not
to put it to the test: I bet that not many interrupt controllers have
been heavily tested in this scenario. What do you think?

Thinking outside the box, another way to solve this problem is to reject
any interrupt affinity changes while an interrupt migration is still in
progress. In fact, that scenario is very uncommon. As far as I know
operating systems deactivate interrupts before migrating them.

Otherwise, I think it is very reasonable to store the vcpu id (or the
pcpu id) in struct pending_irq: we already store the lr register number
there. The current code can tell in which lr register an interrupt has
been written to, but it cannot tell to which cpu the lr register belongs
to. It's a paradox. Once we know the vcpu id for any inflight irqs, then
we can make sure to take the right vcpu.vgic lock from vgic_migrate_irq.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.