[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
On Fri, Nov 21, 2014 at 03:55:37PM +0000, Jan Beulich wrote: > >>> On 21.11.14 at 16:14, <konrad.wilk@xxxxxxxxxx> wrote: > > On Fri, Nov 21, 2014 at 01:03:43PM +0100, Jan Beulich wrote: > >> >>> On 21.11.14 at 12:50, <konrad.wilk@xxxxxxxxxx> wrote: > >> > On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@xxxxxxxx> > >> > wrote: > >> >>>>> On 20.11.14 at 20:51, <konrad.wilk@xxxxxxxxxx> wrote: > >> >>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d, > >> >>struct hvm_pirq_dpci *pirq_dpci) > >> >>> ASSERT(d->arch.hvm_domain.irq.dpci); > >> >>> > >> >>> spin_lock(&d->event_lock); > >> >>> - if ( pirq_dpci->state ) > >> >>> + if ( test_and_clear_bool(pirq_dpci->masked) ) > >> >>> { > >> >>> struct pirq *pirq = dpci_pirq(pirq_dpci); > >> >>> const struct dev_intx_gsi_link *digl; > >> >> > >> >>So this now guards solely against the timeout enforced EOI? Why do > >> >>you no longer need to guard against cancellation (i.e. why isn't this > >> >>looking at both ->state and ->masked)? > >> >> > >> > > >> > The previous state check was superfluous as the dpci_softirq would check > >> > for > >> > the valid STATE_ before calling hvm_dirq_assist (and deal with > >> > cancellation). > >> > >> I thought so first too, but then how is the guarding against > >> cancellation working now? > > > > Since there are two forms of cancellation, lets consider each one seperatly: > > > > 1). pt_irq_create_bind and pt_irq_destroy_bind. Each of them calling > > pt_pirq_softirq_reset in the 'error' case or in the normal path of > > destruction. > > When that happens the action handler for the IRQ is removed the moment > > we call > > pirq_guest_unbind. As such we only have to deal with the potential > > outstanding > > pirq_dpci waiting to be serviced. Since we did the > > 'pt_pirq_softirq_reset' > > we have changed the state from STATE_SCHED to zero. That means > > dpci_softirq will > > NOT go in: > > if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) ) > > Unless the flag got set again in the meantime. Since the event lock I am having a hard time seeing when 'state' be set in that scenario. The 'raise_softirq_for' is called when 'pirq->dpci->flags' has some value. That gets cleared in 'pt_irq_create_bind' and 'pt_irq_destroy_bind'. Also inside of those functions we try our best to change the 'state' from STATE_SCHED to zero. But lets imagine we are in 'hvm_dirq_assist' right about to take the lock, and another CPU is calling 'pt_irq_create_bind'. The 'pt_irq_create_bind' will spin on: 234 /* 235 * A crude 'while' loop with us dropping the spinlock and giving 236 * the softirq_dpci a chance to run. 237 * We MUST check for this condition as the softirq could be scheduled 238 * and hasn't run yet. Note that this code replaced tasklet_kill which 239 * would have spun forever and would do the same thing (wait to flush out 240 * outstanding hvm_dirq_assist calls. 241 */ 242 if ( pt_pirq_softirq_active(pirq_dpci) ) 243 { 244 spin_unlock(&d->event_lock); 245 cpu_relax(); 246 goto restart; OK, so that is not it. Lets try another case - 'pt_irq_creates_bind' is in the error path: rc = msixtbl_pt_register(d, info, pt_irq_bind->u.msi.gtable); 279 if ( unlikely(rc) ) 280 { 281 pirq_guest_unbind(d, info); 282 /* 283 * Between 'pirq_guest_bind' and before 'pirq_guest_unbind' 284 * an interrupt can be scheduled. No more of them are going 285 * to be scheduled but we must deal with the one that may be 286 * in the queue. 287 */ 288 pt_pirq_softirq_reset(pirq_dpci); And right at that moment the softirq is running. It has already set STATE_RUN and just cleared STATE_SCHED and is executing 'hvm_dirq_assist'. It is stuck on the spin_lock waiting for that. The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset' to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED'). 'pt_irq_create_bind' continues on with setting '->flag=0' and unlocks the lock. 'hvm_dirq_assist' grabs the lock and continues one (and state is STATE_RUN). Since 'flag = 0' and 'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing anything. Well, it ends up calling the dreaded 'set_timer' - which will hit another assert as it has never been initialized. Adding in 'mapping = 0' on the error paths for MSI takes care of that. That will take care of that - and I just rolled 'masked =0' in the pt_pirq_softirq_reset function. The legacy interrupt does not have this window, so it cannot do it. > is being acquired right before the line quoted above, _that_ is No. The event lock is acquired in 'hvm_dirq_assist' which would be checking the 'mapping', not 'state'. spin_lock(&d->event_lock); if ( test_and_clear_bool(pirq_dpci->masked) ) > what needs closely looking at (and why I was asking about the > deletion of the [at the first glance unnecessary] checking of ->state > in hvm_dirq_assist()). From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> Date: Thu, 20 Nov 2014 14:28:11 -0500 Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process. commit f6dd295381f4b6a66acddacf46bca8940586c8d8 "dpci: replace tasklet with softirq" used the 'masked' as an two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list causing an list corruption. The code would trigger the following path causing list corruption: \-timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(&our_list)) list_del, but has not yet done 'entry->next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry->next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. \- dpci_softirq while (!list_emtpry) list_del, but ->next already has LIST_POISON1 and we blow up. An alternative solution was proposed (adding STATE_ZOMBIE and making pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above issue but had an fatal bug. It would miss interrupts that are to be scheduled! This patch brings back the 'masked' boolean which is used as an communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and 'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime 'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it. The 'state' is left as a seperate mechanism to provide an mechanism between 'raise_sofitrq' and 'softirq_dpci' to communicate the state of the 'struct hvm_dirq_pirq'. However since we have now two seperate machines we have to deal with an cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind' is called while an 'hvm_dirq_assist' is just about to service. The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls 'set_timer' hitting an ASSERT. By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'. In the 'pt_irq_create_bind' - in the error cases we could be seeing an softirq scheduled right away and being serviced (though stuck at the spinlock). The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset' to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED'). 'pt_irq_create_bind' continues on with setting '->flag=0' and unlocks the lock. 'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and 'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing anything until it hits 'set_timer' which is undefined for MSI. Adding in 'masked=0' for the MSI case fixes that. The legacy interrupt one does not need it as there is no chance of do_IRQ being called at that point. Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> v2: Deal with 'masked' in pt_irq_destroy_bind. v3: Deal with 'masked' in pt_irq_create_bind. --- xen/drivers/passthrough/io.c | 12 ++++++++++-- xen/include/xen/hvm/irq.h | 1 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..ae050df 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -129,6 +129,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) pirq_dpci->dom = NULL; break; } + /* + * Inhibit 'hvm_dirq_assist' from doing anything useful and at worst + * calling 'set_timer' which will blow up (as we have called kill_timer + * or never initialized it). Note that we hold the lock that + * 'hvm_dirq_assist' could be spinning on. + */ + pirq_dpci->masked = 0; } bool_t pt_irq_need_timer(uint32_t flags) @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, &pirq_dpci->flags) ) { - pirq_dpci->state = 0; + pirq_dpci->masked = 0; pirq_dpci->pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -610,6 +617,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq) !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) ) return 0; + pirq_dpci->masked = 1; raise_softirq_for(pirq_dpci); return 1; } @@ -669,7 +677,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) ASSERT(d->arch.hvm_domain.irq.dpci); spin_lock(&d->event_lock); - if ( pirq_dpci->state ) + if ( test_and_clear_bool(pirq_dpci->masked) ) { struct pirq *pirq = dpci_pirq(pirq_dpci); const struct dev_intx_gsi_link *digl; diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h index 9709397..3996f1f 100644 --- a/xen/include/xen/hvm/irq.h +++ b/xen/include/xen/hvm/irq.h @@ -94,6 +94,7 @@ struct hvm_irq_dpci { struct hvm_pirq_dpci { uint32_t flags; unsigned int state; + bool_t masked; uint16_t pending; struct list_head digl_list; struct domain *dom; -- 1.9.3 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |