[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] xen/arm: fix rank/vgic lock inversion bug



On Wed, 8 Feb 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 02/02/17 22:56, Stefano Stabellini wrote:
> > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > On 01/02/17 23:23, Stefano Stabellini wrote:
> > > > On Wed, 1 Feb 2017, Julien Grall wrote:
> > > > > On 31/01/2017 23:49, Stefano Stabellini wrote:
> > > > > > On Fri, 27 Jan 2017, Julien Grall wrote:
> > > > > > > On 03/01/17 23:29, Stefano Stabellini wrote:
> > > > > For LPIs, there is no activate state. So as soon as they are EOIed,
> > > > > they
> > > > > might
> > > > > come up again. Depending on how will we handle irq migration, your
> > > > > scenario
> > > > > will become true. I am not sure if we should take into account LPIs
> > > > > right
> > > > > now.
> > > > > 
> > > > > To be honest, I don't much like the idea of kicking the other vCPU.
> > > > > But I
> > > > > don't have a better idea in order to clear the LRs.
> > 
> > What if we skip the interrupt if it's an LPI, and we kick the other vcpu
> > and wait if it's an SPI? Software should be more tolerant of lost
> > interrupts in case of LPIs. We are also considering rate-limiting them
> > anyway, which implies the possibility of skipping some LPIs at times.
> 
> I will skip the answer here, as your suggestion to solve the inversion lock
> sounds better.

OK


> > > > Me neither, that's why I was proposing a different solution instead. We
> > > > still have the option to take the right lock in vgic_migrate_irq:
> > > > 
> > > > http://marc.info/?l=xen-devel&m=148237289620471
> > > > 
> > > > The code is more complex, but I think it's safe in all cases.
> > > 
> > > It is not only complex but also really confusing as we would have a
> > > variable
> > > protected by two locks, both lock does not need to be taken at the same
> > > time.
> > 
> > Yes, but there is a large in-code comment about it :-)
> > 
> > 
> > > I may have an idea to avoid completely the lock in vgic_get_target_vcpu.
> > > The
> > > lock is only here to read the target vcpu in the rank, the rest does not
> > > need
> > > a lock, right? So could not we read the target vcpu atomically instead?
> > 
> > Yes, I think that could solve the lock inversion bug:
> > 
> > - remove the spin_lock in vgic_get_target_vcpu and replace it with an atomic
> > read
> > - replace rank->vcpu writes with atomic writes
> > 
> > However, it would not solve the other issu affecting the current code:
> > http://marc.info/?l=xen-devel&m=148218667104072, which is related to the
> > problem you mentioned about irq_set_affinity and list_del_init in
> > gic_update_one_lr not being separated by a barrier. Arguably, that bug
> > could be solved separately.  It would be easier to solve that bug by one
> > of these approaches:
> > 
> > 1) use the same (vgic) lock in gic_update_one_lr and vgic_migrate_irq
> > 2) remove the irq_set_affinity call from gic_update_one_lr
> > 
> > where 1) is the approach taken by v2 of this series and 2) is the
> > approach taken by this patch.
> 
> I think there is a easier solution. You want to make sure that any writes
> (particularly list_del_init) before routing the IRQ are visible to the other
> processors. So a simple barrier in both side should be enough here.

You are right

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.