[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote: > 1.2 I also have checked solution where on_selected_cpus call was moved > out of the > interrupt handler. Unfortunately, it doesn't work. > > I almost immediately see next error: > (XEN) Assertion 'this_cpu(eoi_irq) == NULL' failed, line 981, file gic.c > (XEN) Xen BUG at gic.c:981 > (XEN) CPU1: Unexpected Trap: Undefined Instruction > (XEN) ----[ Xen-4.4-unstable arm32 debug=y Not tainted ]---- > (XEN) CPU: 1 > (XEN) PC: 00241ee0 __bug+0x2c/0x44 > (XEN) CPSR: 2000015a MODE:Hypervisor > (XEN) R0: 0026770c R1: 00000000 R2: 3fd2fd00 R3: 00000fff > (XEN) R4: 00263248 R5: 00264384 R6: 000003d5 R7: 4003d000 > (XEN) R8: 00000001 R9: 00000091 R10:00000000 R11:40037ebc R12:00000001 > (XEN) HYP: SP: 40037eb4 LR: 00241ee0 > (XEN) > (XEN) VTCR_EL2: 80002558 > (XEN) VTTBR_EL2: 00010000deffc000 > (XEN) > (XEN) SCTLR_EL2: 30cd187f > (XEN) HCR_EL2: 0000000000002835 > (XEN) TTBR0_EL2: 00000000d2014000 > (XEN) > (XEN) ESR_EL2: 00000000 > (XEN) HPFAR_EL2: 0000000000482110 > (XEN) HDFAR: fa211f00 > (XEN) HIFAR: 00000000 > (XEN) > (XEN) Xen stack trace from sp=40037eb4: > (XEN) 00000000 40037efc 00247e1c 002e6610 002e6610 002e6608 002e6608 > 00000001 > (XEN) 00000000 40015000 40017000 40005f60 40017014 40037f58 00000019 > 00000000 > (XEN) 40005f60 40037f24 00249068 00000009 00000019 00404000 40037f58 > 00000000 > (XEN) 00405000 00004680 002e7694 40037f4c 00248b80 00000000 c5b72000 > 00000091 > (XEN) 00000000 c700d4e0 c008477c 000000f1 00000001 40037f54 0024f6c0 > 40037f58 > (XEN) 00251a30 c700d4e0 00000001 c008477c 00000000 c5b72000 00000091 > 00000000 > (XEN) c700d4e0 c008477c 000000f1 00000001 00000001 c5b72000 ffffffff > 0000a923 > (XEN) c0077ac4 60000193 00000000 b6eadaa0 c0578f40 c00138c0 c5b73f58 > c036ab90 > (XEN) c0578f4c c00136a0 c0578f58 c0013920 00000000 00000000 00000000 > 00000000 > (XEN) 00000000 00000000 00000000 80000010 60000193 a0000093 80000193 > 00000000 > (XEN) 00000000 0c41e00c 450c2880 > (XEN) Xen call trace: > (XEN) [<00241ee0>] __bug+0x2c/0x44 (PC) > (XEN) [<00241ee0>] __bug+0x2c/0x44 (LR) > (XEN) [<00247e1c>] maintenance_interrupt+0x2e8/0x328 > (XEN) [<00249068>] do_IRQ+0x138/0x198 > (XEN) [<00248b80>] gic_interrupt+0x58/0xc0 > (XEN) [<0024f6c0>] do_trap_irq+0x10/0x14 > (XEN) [<00251a30>] return_from_trap+0/0x4 > (XEN) Are you seeing more than one interrupt being EOI'ed with a single maintenance interrupt? I didn't think it could be possible in practice. If so, we might have to turn eoi_irq into a list or an array. > 2. The "simultaneous cross-interrupts" issue doesn't occur if I use > next solution: > So, as result I don't see deadlock in on_selected_cpus() > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index e6257a7..af96a31 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const > struct dt_irq *irq, > > level = dt_irq_is_level_triggered(irq); > > - gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()), > - 0xa0); > + gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0); > > retval = __setup_irq(desc, irq->irq, action); > if (retval) { > So, as result I don't see deadlock in on_selected_cpus(). As I stated before I think this is a good change to have in 4.4. > But, rarely, I see deadlocks in other parts related to interrupts handling. > As noted by Julien, I am using the old version of the interrupt patch series. > I completely agree. > > We are based on next XEN commit: > 48249a1 libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list > > Also we have some patches, which we cherry-picked when we urgently needed > them: > 6bba1a3 xen/arm: Keep count of inflight interrupts > 33a8aa9 xen/arm: Only enable physical IRQs when the guest asks > b6a4e65 xen/arm: Rename gic_irq_{startup, shutdown} to gic_irq_{mask, unmask} > 5dbe455 xen/arm: Don't reinject the IRQ if it's already in LRs > 1438f03 xen/arm: Physical IRQ is not always equal to virtual IRQ > > I have to apply next patches and check with them: > 88eb95e xen/arm: disable a physical IRQ when the guest disables the > corresponding IRQ > a660ee3 xen/arm: implement gic_irq_enable and gic_irq_disable > 1dc9556 xen/arm: do not add a second irq to the LRs if one is already present > d16d511 xen/arm: track the state of guest IRQs > > I'll report about the results. I hope to do it today. I am looking forward to reading your report. Cheers, Stefano > A lot of thanks to all. > > On Thu, Jan 30, 2014 at 5:35 PM, Stefano Stabellini > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > > Given that we don't deactivate the interrupt (writing to GICC_DIR) until > > the guest EOIs it, I can't understand how you manage to get a second > > interrupt notifications before the guest EOIs the first one. > > > > Do you set GICC_CTL_EOI in GICC_CTLR? > > > > On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote: > >> According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH) > >> > >> On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini > >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >> > Is it a level or an edge irq? > >> > > >> > On Wed, 29 Jan 2014, Julien Grall wrote: > >> >> Hi, > >> >> > >> >> It's weird, physical IRQ should not be injected twice ... > >> >> Were you able to print the IRQ number? > >> >> > >> >> In any case, you are using the old version of the interrupt patch > >> >> series. > >> >> Your new error may come of race condition in this code. > >> >> > >> >> Can you try to use a newest version? > >> >> > >> >> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" > >> >> <oleksandr.tyshchenko@xxxxxxxxxxxxxxx> wrote: > >> >> > Right, that's why changing it to cpumask_of(0) shouldn't make > >> >> any > >> >> > difference for xen-unstable (it should make things clearer, if > >> >> nothing > >> >> > else) but it should fix things for Oleksandr. > >> >> > >> >> Unfortunately, it is not enough for stable work. > >> >> > >> >> I was tried to use cpumask_of(smp_processor_id()) instead of > >> >> cpumask_of(0) in > >> >> gic_route_irq_to_guest(). And as result, I don't see our situation > >> >> which cause to deadlock in on_selected_cpus function (expected). > >> >> But, hypervisor sometimes hangs somewhere else (I have not > >> >> identified > >> >> yet where this is happening) or I sometimes see traps, like that: > >> >> ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to > >> >> them) > >> >> > >> >> (XEN) CPU1: Unexpected Trap: Undefined Instruction > >> >> (XEN) ----[ Xen-4.4-unstable arm32 debug=y Not tainted ]---- > >> >> (XEN) CPU: 1 > >> >> (XEN) PC: 00242c1c __warn+0x20/0x28 > >> >> (XEN) CPSR: 200001da MODE:Hypervisor > >> >> (XEN) R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff > >> >> (XEN) R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000 > >> >> (XEN) R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc > >> >> R12:00000002 > >> >> (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c > >> >> (XEN) > >> >> (XEN) VTCR_EL2: 80002558 > >> >> (XEN) VTTBR_EL2: 00020000dec6a000 > >> >> (XEN) > >> >> (XEN) SCTLR_EL2: 30cd187f > >> >> (XEN) HCR_EL2: 00000000000028b5 > >> >> (XEN) TTBR0_EL2: 00000000d2014000 > >> >> (XEN) > >> >> (XEN) ESR_EL2: 00000000 > >> >> (XEN) HPFAR_EL2: 0000000000482110 > >> >> (XEN) HDFAR: fa211190 > >> >> (XEN) HIFAR: 00000000 > >> >> (XEN) > >> >> (XEN) Xen stack trace from sp=4bfd7eb4: > >> >> (XEN) 0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 > >> >> 00000097 00000001 > >> >> (XEN) 00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 > >> >> 00000019 00000000 > >> >> (XEN) 40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 > >> >> 4bfd7f58 00000000 > >> >> (XEN) 00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 > >> >> 00000097 00000097 > >> >> (XEN) 00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 > >> >> 0024f4b8 4bfd7f58 > >> >> (XEN) 00251830 ea80c950 00000000 00000001 c0079a90 00000097 > >> >> 00000097 00000000 > >> >> (XEN) fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c > >> >> ffffffff b6efbca3 > >> >> (XEN) c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 > >> >> e9879eb8 c007680c > >> >> (XEN) c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 > >> >> 00000000 00000000 > >> >> (XEN) 00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 > >> >> 40000193 00000000 > >> >> (XEN) ffeffbfe fedeefff fffd5ffe > >> >> (XEN) Xen call trace: > >> >> (XEN) [<00242c1c>] __warn+0x20/0x28 (PC) > >> >> (XEN) [<00242c1c>] __warn+0x20/0x28 (LR) > >> >> (XEN) [<00247a54>] maintenance_interrupt+0xfc/0x2f4 > >> >> (XEN) [<00248e60>] do_IRQ+0x138/0x198 > >> >> (XEN) [<00248978>] gic_interrupt+0x58/0xc0 > >> >> (XEN) [<0024f4b8>] do_trap_irq+0x10/0x14 > >> >> (XEN) [<00251830>] return_from_trap+0/0x4 > >> >> (XEN) > >> >> > >> >> Also I am posting maintenance_interrupt() from my tree: > >> >> > >> >> static void maintenance_interrupt(int irq, void *dev_id, struct > >> >> cpu_user_regs *regs) > >> >> { > >> >> int i = 0, virq, pirq; > >> >> uint32_t lr; > >> >> struct vcpu *v = current; > >> >> uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) > >> >> GICH[GICH_EISR1]) << 32); > >> >> > >> >> while ((i = find_next_bit((const long unsigned int *) &eisr, > >> >> 64, i)) < 64) { > >> >> struct pending_irq *p, *n; > >> >> int cpu, eoi; > >> >> > >> >> cpu = -1; > >> >> eoi = 0; > >> >> > >> >> spin_lock_irq(&gic.lock); > >> >> lr = GICH[GICH_LR + i]; > >> >> virq = lr & GICH_LR_VIRTUAL_MASK; > >> >> > >> >> p = irq_to_pending(v, virq); > >> >> if ( p->desc != NULL ) { > >> >> p->desc->status &= ~IRQ_INPROGRESS; > >> >> /* Assume only one pcpu needs to EOI the irq */ > >> >> cpu = p->desc->arch.eoi_cpu; > >> >> eoi = 1; > >> >> pirq = p->desc->irq; > >> >> } > >> >> if ( !atomic_dec_and_test(&p->inflight_cnt) ) > >> >> { > >> >> /* Physical IRQ can't be reinject */ > >> >> WARN_ON(p->desc != NULL); > >> >> gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority); > >> >> spin_unlock_irq(&gic.lock); > >> >> i++; > >> >> continue; > >> >> } > >> >> > >> >> GICH[GICH_LR + i] = 0; > >> >> clear_bit(i, &this_cpu(lr_mask)); > >> >> > >> >> if ( !list_empty(&v->arch.vgic.lr_pending) ) { > >> >> n = list_entry(v->arch.vgic.lr_pending.next, > >> >> typeof(*n), lr_queue); > >> >> gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority); > >> >> list_del_init(&n->lr_queue); > >> >> set_bit(i, &this_cpu(lr_mask)); > >> >> } else { > >> >> gic_inject_irq_stop(); > >> >> } > >> >> spin_unlock_irq(&gic.lock); > >> >> > >> >> spin_lock_irq(&v->arch.vgic.lock); > >> >> list_del_init(&p->inflight); > >> >> spin_unlock_irq(&v->arch.vgic.lock); > >> >> > >> >> if ( eoi ) { > >> >> /* this is not racy because we can't receive another > >> >> irq of the > >> >> * same type until we EOI it. */ > >> >> if ( cpu == smp_processor_id() ) > >> >> gic_irq_eoi((void*)(uintptr_t)pirq); > >> >> else > >> >> on_selected_cpus(cpumask_of(cpu), > >> >> gic_irq_eoi, > >> >> (void*)(uintptr_t)pirq, 0); > >> >> } > >> >> > >> >> i++; > >> >> } > >> >> } > >> >> > >> >> > >> >> Oleksandr Tyshchenko | Embedded Developer > >> >> GlobalLogic > >> >> > >> >> > >> >> > >> > >> > >> > >> -- > >> > >> Name | Title > >> GlobalLogic > >> P +x.xxx.xxx.xxxx M +x.xxx.xxx.xxxx S skype > >> www.globallogic.com > >> > >> http://www.globallogic.com/email_disclaimer.txt > >> > > > > -- > > Name | Title > GlobalLogic > P +x.xxx.xxx.xxxx M +x.xxx.xxx.xxxx S skype > www.globallogic.com > > http://www.globallogic.com/email_disclaimer.txt > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |