Xen project Mailing List

Re: [Xen-devel] [PATCH v3] xen/arm: fix rank/vgic lock inversion bug

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Fri, 10 Feb 2017 17:37:43 -0800 (PST)

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andre Przywara <Andre.Przywara@xxxxxxx>

Delivery-date: Sat, 11 Feb 2017 01:38:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, 8 Feb 2017, Julien Grall wrote: > Hi Stefano, > > On 02/02/17 22:56, Stefano Stabellini wrote: > > On Thu, 2 Feb 2017, Julien Grall wrote: > > > On 01/02/17 23:23, Stefano Stabellini wrote: > > > > On Wed, 1 Feb 2017, Julien Grall wrote: > > > > > On 31/01/2017 23:49, Stefano Stabellini wrote: > > > > > > On Fri, 27 Jan 2017, Julien Grall wrote: > > > > > > > On 03/01/17 23:29, Stefano Stabellini wrote: > > > > > For LPIs, there is no activate state. So as soon as they are EOIed, > > > > > they > > > > > might > > > > > come up again. Depending on how will we handle irq migration, your > > > > > scenario > > > > > will become true. I am not sure if we should take into account LPIs > > > > > right > > > > > now. > > > > > > > > > > To be honest, I don't much like the idea of kicking the other vCPU. > > > > > But I > > > > > don't have a better idea in order to clear the LRs. > > > > What if we skip the interrupt if it's an LPI, and we kick the other vcpu > > and wait if it's an SPI? Software should be more tolerant of lost > > interrupts in case of LPIs. We are also considering rate-limiting them > > anyway, which implies the possibility of skipping some LPIs at times. > > I will skip the answer here, as your suggestion to solve the inversion lock > sounds better. OK > > > > Me neither, that's why I was proposing a different solution instead. We > > > > still have the option to take the right lock in vgic_migrate_irq: > > > > > > > > http://marc.info/?l=xen-devel&m=148237289620471 > > > > > > > > The code is more complex, but I think it's safe in all cases. > > > > > > It is not only complex but also really confusing as we would have a > > > variable > > > protected by two locks, both lock does not need to be taken at the same > > > time. > > > > Yes, but there is a large in-code comment about it :-) > > > > > > > I may have an idea to avoid completely the lock in vgic_get_target_vcpu. > > > The > > > lock is only here to read the target vcpu in the rank, the rest does not > > > need > > > a lock, right? So could not we read the target vcpu atomically instead? > > > > Yes, I think that could solve the lock inversion bug: > > > > - remove the spin_lock in vgic_get_target_vcpu and replace it with an atomic > > read > > - replace rank->vcpu writes with atomic writes > > > > However, it would not solve the other issu affecting the current code: > > http://marc.info/?l=xen-devel&m=148218667104072, which is related to the > > problem you mentioned about irq_set_affinity and list_del_init in > > gic_update_one_lr not being separated by a barrier. Arguably, that bug > > could be solved separately. It would be easier to solve that bug by one > > of these approaches: > > > > 1) use the same (vgic) lock in gic_update_one_lr and vgic_migrate_irq > > 2) remove the irq_set_affinity call from gic_update_one_lr > > > > where 1) is the approach taken by v2 of this series and 2) is the > > approach taken by this patch. > > I think there is a easier solution. You want to make sure that any writes > (particularly list_del_init) before routing the IRQ are visible to the other > processors. So a simple barrier in both side should be enough here. You are right _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.