[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
If we pass in INTx type devices to a guest on an over-subscribed machine - and in an over-worked guest - we can cause the pirq_dpci->softirq_list to become corrupted. The reason for this is that the 'pt_irq_guest_eoi' ends up setting the 'state' to zero value. However the 'state' value (STATE_SCHED, STATE_RUN) is used to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list. The end result was list_del being called twice and the second call corrupting the per-cpu list. For this to occur one of the CPUs must be in the idle loop executing softirqs and the interrupt handler in the guest must not respond to the pending interrupt within 8ms, and we must receive another interrupt for this device on another CPU. CPU0: CPU1: timer_softirq_action \- pt_irq_time_out state = 0; do_IRQ [out of timer code, the raise_softirq pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1 dpci_list as state == 0] softirq_dpci: softirq_dpci: list_del [list entries are poisoned] list_del <= BOOM The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was pt_pirq_softirq_reset) to use cmpxchg. We also expand said function to reset the '->dom' only on the teardown paths - but not on the timeouts. Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> --- xen/drivers/passthrough/io.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..2039d31 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -57,7 +57,7 @@ enum { * This can be called multiple times, but the softirq is only raised once. * That is until the STATE_SCHED state has been cleared. The state can be * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) } /* - * Reset the pirq_dpci->dom parameter to NULL. + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set, + * reset pirq_dpci->dom parameter to NULL (used for teardown). * * This function checks the different states to make sure it can do it * at the right time. If it unschedules the 'hvm_dirq_assist' from running * it also refcounts (which is what the softirq would have done) properly. */ -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, + unsigned int clear) { struct domain *d = pirq_dpci->dom; @@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' * in local variable before it sets STATE_RUN - and therefore will not * dereference '->dom' which would crash. + * + * However, if this is called from 'pt_irq_time_out' we do not want to + * clear the '->dom' as we can re-use the 'pirq_dpci' after that and + * need '->dom'. */ - pirq_dpci->dom = NULL; + if ( clear ) + pirq_dpci->dom = NULL; break; } } @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, &pirq_dpci->flags) ) { - pirq_dpci->state = 0; + pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */); pirq_dpci->pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -285,7 +292,7 @@ int pt_irq_create_bind( * to be scheduled but we must deal with the one that may be * in the queue. */ - pt_pirq_softirq_reset(pirq_dpci); + pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); } } if ( unlikely(rc) ) @@ -536,9 +543,9 @@ int pt_irq_destroy_bind( pirq_dpci->flags = 0; /* * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the - * call to pt_pirq_softirq_reset. + * call to pt_pirq_softirq_cancel. */ - pt_pirq_softirq_reset(pirq_dpci); + pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); pirq_cleanup_check(pirq, d); } @@ -773,8 +780,8 @@ unlock: } /* - * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to - * doing it. If that is the case we let 'pt_pirq_softirq_reset' do ref-counting. + * Note: 'pt_pirq_softirq_cancel' can clear the STATE_SCHED before we get to + * doing it. If that is the case we let 'pt_pirq_softirq_cancel' do ref-counting. */ static void dpci_softirq(void) { -- 1.9.3 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |