Xen project Mailing List

Re: [Xen-devel] [PATCH] x86: fix ordering of operations in destroy_irq()

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 29 May 2013 23:17:40 +0100

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, "Keir \(Xen.org\)" <keir@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 29 May 2013 22:17:52 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 29/05/2013 08:23, Jan Beulich wrote: >>>> On 29.05.13 at 08:58, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >> The fix for XSA-36, switching the default of vector map management to >> be per-device, exposed more readily a problem with the cleanup of these >> vector maps: dynamic_irq_cleanup() clearing desc->arch.used_vectors >> keeps the subsequently invoked clear_irq_vector() from clearing the >> bits for both the in-use and a possibly still outstanding old vector. > This resulted from an Andrew bringing the issue up on security@, > and there are more problems here which partly he pointed out and > partly we came to realize during the discussion there. > > One issue is that the vector map management is not really taking > effect for devices using multiple interrupts (at present MSI-X only) > because the initial vector assignment in create_irq() can't possibly > take into account the vector map, as that can be (and is being, in > map_domain_pirq()) set only after that function returns. > > Andrew suggests to immediately re-assign the vector in > map_domain_pirq(), but that would be cumbersome as a "normal" > re-assignment only takes effect the next time an IRQ actually > occurs. I therefore think that we should rather suppress the initial > assignment of a vector, deferring it until after the vector map got > set. > > We figured that this is no security relevant as the respective > assertion is a debug build only thing (and debug build only issues > were decided to not be handled via issuing advisories some time > ago), and the effect this guards against (improperly set up IRTEs > on AMD IOMMU) is only affecting the guest itself (using - until the > multi-vector MSI series goes in - solely vector based indexing): A > second CPU getting the same vector allocated as is in use for > another IRQ of the same device on another CPU, this would simply > break the IRQs raised to that guest. > >> Once at it, also defer resetting of desc->handler until after the loop >> around smp_mb() checking for IRQ_INPROGRESS to be clear, fixing a >> (mostly theoretical) issue with the intercation with do_IRQ(): If we >> don't defer the pointer reset, do_IRQ() could, for non-guest IRQs, call >> ->ack() and ->end() with different ->handler pointers, potentially >> leading to an IRQ remaining un-acked. The issue is mostly theoretical >> because non-guest IRQs are subject to destroy_irq() only on (boot time) >> error paths. > While this change also reduces the chances of an IRQ getting raised > along with destroy_irq() getting invoked and, due to do_IRQ() losing > the race for desc->lock, calling ack_bad_irq() due to ->handler > already having got set to &no_irq_type, it doesn't entirely close the > window. While the issue is cosmetic only (the IRQ raised here has no > meaning anymore as it ie being torn down, and hence the message > issued from ack_bad_irq() is merely a false positive), I think that this > could be fixed by transitioning desc->arch.used to IRQ_RESERVED > (instead of IRQ_UNUSED) in __clear_irq_vector(), and deferring both > the resetting of desc->handler and the setting of desc->arch.used to > IRQ_UNUSED to an RCU callback. > >> As to the changed locking: Invoking clear_irq_vector() with desc->lock >> held is okay because vector_lock already nests inside desc->lock (proven >> by set_desc_affinity(), which takes vector_lock and gets called from >> various desc->handler->ack implementations, getting invoked with >> desc->lock held). > The locking around vector management appears to be a wider issue: > Changes to IRQ descriptors generally are to be protected by holding > desc->lock, yet all of the vector management is only guarded by > vector_lock. The patch here addresses this for destroy_irq()'s > invocation of clear_irq_vector(), but this apparently would need > taking care of on all other affected code paths. > > Andrew - please double check I didn't forget any of the aspects we > discussed yesterday. > > Jan > I believe that covers the issues we discovered. I will be back in the office properly on Friday and can see about addressing the issues not addressed so far in this patch. As for the patch itself, Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> As for the further points. I am not sure the RCU stuff is sufficient. There can be an unknown number of irqs in transit somewhere in the hardware between a destroy_irq() running ->shutdown() and desc->handler being replaced with &no_irq_type. A grace time long enough to be fairly sure that any irqs-in-transit are delivered will remove the chance of false-positives in ack_none(), at the risk of some false negatives slipping by. I suppose another trick with -irq might be able to be used to distinguish between a late irq-in-transit and a real spurious irq. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.