Xen project Mailing List

Re: [Xen-devel] [PATCH] x86: fix ordering of operations in destroy_irq()

To: "xen-devel" <xen-devel@xxxxxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 29 May 2013 08:23:33 +0100

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>

Delivery-date: Wed, 29 May 2013 07:24:03 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 29.05.13 at 08:58, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > The fix for XSA-36, switching the default of vector map management to > be per-device, exposed more readily a problem with the cleanup of these > vector maps: dynamic_irq_cleanup() clearing desc->arch.used_vectors > keeps the subsequently invoked clear_irq_vector() from clearing the > bits for both the in-use and a possibly still outstanding old vector. This resulted from an Andrew bringing the issue up on security@, and there are more problems here which partly he pointed out and partly we came to realize during the discussion there. One issue is that the vector map management is not really taking effect for devices using multiple interrupts (at present MSI-X only) because the initial vector assignment in create_irq() can't possibly take into account the vector map, as that can be (and is being, in map_domain_pirq()) set only after that function returns. Andrew suggests to immediately re-assign the vector in map_domain_pirq(), but that would be cumbersome as a "normal" re-assignment only takes effect the next time an IRQ actually occurs. I therefore think that we should rather suppress the initial assignment of a vector, deferring it until after the vector map got set. We figured that this is no security relevant as the respective assertion is a debug build only thing (and debug build only issues were decided to not be handled via issuing advisories some time ago), and the effect this guards against (improperly set up IRTEs on AMD IOMMU) is only affecting the guest itself (using - until the multi-vector MSI series goes in - solely vector based indexing): A second CPU getting the same vector allocated as is in use for another IRQ of the same device on another CPU, this would simply break the IRQs raised to that guest. > Once at it, also defer resetting of desc->handler until after the loop > around smp_mb() checking for IRQ_INPROGRESS to be clear, fixing a > (mostly theoretical) issue with the intercation with do_IRQ(): If we > don't defer the pointer reset, do_IRQ() could, for non-guest IRQs, call > ->ack() and ->end() with different ->handler pointers, potentially > leading to an IRQ remaining un-acked. The issue is mostly theoretical > because non-guest IRQs are subject to destroy_irq() only on (boot time) > error paths. While this change also reduces the chances of an IRQ getting raised along with destroy_irq() getting invoked and, due to do_IRQ() losing the race for desc->lock, calling ack_bad_irq() due to ->handler already having got set to &no_irq_type, it doesn't entirely close the window. While the issue is cosmetic only (the IRQ raised here has no meaning anymore as it ie being torn down, and hence the message issued from ack_bad_irq() is merely a false positive), I think that this could be fixed by transitioning desc->arch.used to IRQ_RESERVED (instead of IRQ_UNUSED) in __clear_irq_vector(), and deferring both the resetting of desc->handler and the setting of desc->arch.used to IRQ_UNUSED to an RCU callback. > As to the changed locking: Invoking clear_irq_vector() with desc->lock > held is okay because vector_lock already nests inside desc->lock (proven > by set_desc_affinity(), which takes vector_lock and gets called from > various desc->handler->ack implementations, getting invoked with > desc->lock held). The locking around vector management appears to be a wider issue: Changes to IRQ descriptors generally are to be protected by holding desc->lock, yet all of the vector management is only guarded by vector_lock. The patch here addresses this for destroy_irq()'s invocation of clear_irq_vector(), but this apparently would need taking care of on all other affected code paths. Andrew - please double check I didn't forget any of the aspects we discussed yesterday. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.