[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
I moved to 4.4.0-rc1 which already has necessary irq patches. And applied only one patch "cpumask_of(0) in gic_route_irq_to_guest". I see that Hypervisor hangs very often. Unfortunately, now I don't have debugger to localize part of code. So, I have to use prints and it may takes some time( On Thu, Jan 30, 2014 at 7:18 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote: >> 1.2 I also have checked solution where on_selected_cpus call was moved >> out of the >> interrupt handler. Unfortunately, it doesn't work. >> >> I almost immediately see next error: >> (XEN) Assertion 'this_cpu(eoi_irq) == NULL' failed, line 981, file gic.c >> (XEN) Xen BUG at gic.c:981 >> (XEN) CPU1: Unexpected Trap: Undefined Instruction >> (XEN) ----[ Xen-4.4-unstable arm32 debug=y Not tainted ]---- >> (XEN) CPU: 1 >> (XEN) PC: 00241ee0 __bug+0x2c/0x44 >> (XEN) CPSR: 2000015a MODE:Hypervisor >> (XEN) R0: 0026770c R1: 00000000 R2: 3fd2fd00 R3: 00000fff >> (XEN) R4: 00263248 R5: 00264384 R6: 000003d5 R7: 4003d000 >> (XEN) R8: 00000001 R9: 00000091 R10:00000000 R11:40037ebc R12:00000001 >> (XEN) HYP: SP: 40037eb4 LR: 00241ee0 >> (XEN) >> (XEN) VTCR_EL2: 80002558 >> (XEN) VTTBR_EL2: 00010000deffc000 >> (XEN) >> (XEN) SCTLR_EL2: 30cd187f >> (XEN) HCR_EL2: 0000000000002835 >> (XEN) TTBR0_EL2: 00000000d2014000 >> (XEN) >> (XEN) ESR_EL2: 00000000 >> (XEN) HPFAR_EL2: 0000000000482110 >> (XEN) HDFAR: fa211f00 >> (XEN) HIFAR: 00000000 >> (XEN) >> (XEN) Xen stack trace from sp=40037eb4: >> (XEN) 00000000 40037efc 00247e1c 002e6610 002e6610 002e6608 002e6608 >> 00000001 >> (XEN) 00000000 40015000 40017000 40005f60 40017014 40037f58 00000019 >> 00000000 >> (XEN) 40005f60 40037f24 00249068 00000009 00000019 00404000 40037f58 >> 00000000 >> (XEN) 00405000 00004680 002e7694 40037f4c 00248b80 00000000 c5b72000 >> 00000091 >> (XEN) 00000000 c700d4e0 c008477c 000000f1 00000001 40037f54 0024f6c0 >> 40037f58 >> (XEN) 00251a30 c700d4e0 00000001 c008477c 00000000 c5b72000 00000091 >> 00000000 >> (XEN) c700d4e0 c008477c 000000f1 00000001 00000001 c5b72000 ffffffff >> 0000a923 >> (XEN) c0077ac4 60000193 00000000 b6eadaa0 c0578f40 c00138c0 c5b73f58 >> c036ab90 >> (XEN) c0578f4c c00136a0 c0578f58 c0013920 00000000 00000000 00000000 >> 00000000 >> (XEN) 00000000 00000000 00000000 80000010 60000193 a0000093 80000193 >> 00000000 >> (XEN) 00000000 0c41e00c 450c2880 >> (XEN) Xen call trace: >> (XEN) [<00241ee0>] __bug+0x2c/0x44 (PC) >> (XEN) [<00241ee0>] __bug+0x2c/0x44 (LR) >> (XEN) [<00247e1c>] maintenance_interrupt+0x2e8/0x328 >> (XEN) [<00249068>] do_IRQ+0x138/0x198 >> (XEN) [<00248b80>] gic_interrupt+0x58/0xc0 >> (XEN) [<0024f6c0>] do_trap_irq+0x10/0x14 >> (XEN) [<00251a30>] return_from_trap+0/0x4 >> (XEN) > > Are you seeing more than one interrupt being EOI'ed with a single > maintenance interrupt? > I didn't think it could be possible in practice. > If so, we might have to turn eoi_irq into a list or an array. > > >> 2. The "simultaneous cross-interrupts" issue doesn't occur if I use >> next solution: >> So, as result I don't see deadlock in on_selected_cpus() >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> index e6257a7..af96a31 100644 >> --- a/xen/arch/arm/gic.c >> +++ b/xen/arch/arm/gic.c >> @@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const >> struct dt_irq *irq, >> >> level = dt_irq_is_level_triggered(irq); >> >> - gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()), >> - 0xa0); >> + gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0); >> >> retval = __setup_irq(desc, irq->irq, action); >> if (retval) { >> So, as result I don't see deadlock in on_selected_cpus(). > > As I stated before I think this is a good change to have in 4.4. > > >> But, rarely, I see deadlocks in other parts related to interrupts handling. >> As noted by Julien, I am using the old version of the interrupt patch series. >> I completely agree. >> >> We are based on next XEN commit: >> 48249a1 libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list >> >> Also we have some patches, which we cherry-picked when we urgently needed >> them: >> 6bba1a3 xen/arm: Keep count of inflight interrupts >> 33a8aa9 xen/arm: Only enable physical IRQs when the guest asks >> b6a4e65 xen/arm: Rename gic_irq_{startup, shutdown} to gic_irq_{mask, unmask} >> 5dbe455 xen/arm: Don't reinject the IRQ if it's already in LRs >> 1438f03 xen/arm: Physical IRQ is not always equal to virtual IRQ >> >> I have to apply next patches and check with them: >> 88eb95e xen/arm: disable a physical IRQ when the guest disables the >> corresponding IRQ >> a660ee3 xen/arm: implement gic_irq_enable and gic_irq_disable >> 1dc9556 xen/arm: do not add a second irq to the LRs if one is already present >> d16d511 xen/arm: track the state of guest IRQs >> >> I'll report about the results. I hope to do it today. > > I am looking forward to reading your report. > Cheers, > > Stefano > >> A lot of thanks to all. >> >> On Thu, Jan 30, 2014 at 5:35 PM, Stefano Stabellini >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> > Given that we don't deactivate the interrupt (writing to GICC_DIR) until >> > the guest EOIs it, I can't understand how you manage to get a second >> > interrupt notifications before the guest EOIs the first one. >> > >> > Do you set GICC_CTL_EOI in GICC_CTLR? >> > >> > On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote: >> >> According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH) >> >> >> >> On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> > Is it a level or an edge irq? >> >> > >> >> > On Wed, 29 Jan 2014, Julien Grall wrote: >> >> >> Hi, >> >> >> >> >> >> It's weird, physical IRQ should not be injected twice ... >> >> >> Were you able to print the IRQ number? >> >> >> >> >> >> In any case, you are using the old version of the interrupt patch >> >> >> series. >> >> >> Your new error may come of race condition in this code. >> >> >> >> >> >> Can you try to use a newest version? >> >> >> >> >> >> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" >> >> >> <oleksandr.tyshchenko@xxxxxxxxxxxxxxx> wrote: >> >> >> > Right, that's why changing it to cpumask_of(0) shouldn't make >> >> >> any >> >> >> > difference for xen-unstable (it should make things clearer, if >> >> >> nothing >> >> >> > else) but it should fix things for Oleksandr. >> >> >> >> >> >> Unfortunately, it is not enough for stable work. >> >> >> >> >> >> I was tried to use cpumask_of(smp_processor_id()) instead of >> >> >> cpumask_of(0) in >> >> >> gic_route_irq_to_guest(). And as result, I don't see our >> >> >> situation >> >> >> which cause to deadlock in on_selected_cpus function (expected). >> >> >> But, hypervisor sometimes hangs somewhere else (I have not >> >> >> identified >> >> >> yet where this is happening) or I sometimes see traps, like that: >> >> >> ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to >> >> >> them) >> >> >> >> >> >> (XEN) CPU1: Unexpected Trap: Undefined Instruction >> >> >> (XEN) ----[ Xen-4.4-unstable arm32 debug=y Not tainted ]---- >> >> >> (XEN) CPU: 1 >> >> >> (XEN) PC: 00242c1c __warn+0x20/0x28 >> >> >> (XEN) CPSR: 200001da MODE:Hypervisor >> >> >> (XEN) R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff >> >> >> (XEN) R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000 >> >> >> (XEN) R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc >> >> >> R12:00000002 >> >> >> (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c >> >> >> (XEN) >> >> >> (XEN) VTCR_EL2: 80002558 >> >> >> (XEN) VTTBR_EL2: 00020000dec6a000 >> >> >> (XEN) >> >> >> (XEN) SCTLR_EL2: 30cd187f >> >> >> (XEN) HCR_EL2: 00000000000028b5 >> >> >> (XEN) TTBR0_EL2: 00000000d2014000 >> >> >> (XEN) >> >> >> (XEN) ESR_EL2: 00000000 >> >> >> (XEN) HPFAR_EL2: 0000000000482110 >> >> >> (XEN) HDFAR: fa211190 >> >> >> (XEN) HIFAR: 00000000 >> >> >> (XEN) >> >> >> (XEN) Xen stack trace from sp=4bfd7eb4: >> >> >> (XEN) 0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 >> >> >> 00000097 00000001 >> >> >> (XEN) 00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 >> >> >> 00000019 00000000 >> >> >> (XEN) 40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 >> >> >> 4bfd7f58 00000000 >> >> >> (XEN) 00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 >> >> >> 00000097 00000097 >> >> >> (XEN) 00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 >> >> >> 0024f4b8 4bfd7f58 >> >> >> (XEN) 00251830 ea80c950 00000000 00000001 c0079a90 00000097 >> >> >> 00000097 00000000 >> >> >> (XEN) fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c >> >> >> ffffffff b6efbca3 >> >> >> (XEN) c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 >> >> >> e9879eb8 c007680c >> >> >> (XEN) c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 >> >> >> 00000000 00000000 >> >> >> (XEN) 00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 >> >> >> 40000193 00000000 >> >> >> (XEN) ffeffbfe fedeefff fffd5ffe >> >> >> (XEN) Xen call trace: >> >> >> (XEN) [<00242c1c>] __warn+0x20/0x28 (PC) >> >> >> (XEN) [<00242c1c>] __warn+0x20/0x28 (LR) >> >> >> (XEN) [<00247a54>] maintenance_interrupt+0xfc/0x2f4 >> >> >> (XEN) [<00248e60>] do_IRQ+0x138/0x198 >> >> >> (XEN) [<00248978>] gic_interrupt+0x58/0xc0 >> >> >> (XEN) [<0024f4b8>] do_trap_irq+0x10/0x14 >> >> >> (XEN) [<00251830>] return_from_trap+0/0x4 >> >> >> (XEN) >> >> >> >> >> >> Also I am posting maintenance_interrupt() from my tree: >> >> >> >> >> >> static void maintenance_interrupt(int irq, void *dev_id, struct >> >> >> cpu_user_regs *regs) >> >> >> { >> >> >> int i = 0, virq, pirq; >> >> >> uint32_t lr; >> >> >> struct vcpu *v = current; >> >> >> uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) >> >> >> GICH[GICH_EISR1]) << 32); >> >> >> >> >> >> while ((i = find_next_bit((const long unsigned int *) &eisr, >> >> >> 64, i)) < 64) { >> >> >> struct pending_irq *p, *n; >> >> >> int cpu, eoi; >> >> >> >> >> >> cpu = -1; >> >> >> eoi = 0; >> >> >> >> >> >> spin_lock_irq(&gic.lock); >> >> >> lr = GICH[GICH_LR + i]; >> >> >> virq = lr & GICH_LR_VIRTUAL_MASK; >> >> >> >> >> >> p = irq_to_pending(v, virq); >> >> >> if ( p->desc != NULL ) { >> >> >> p->desc->status &= ~IRQ_INPROGRESS; >> >> >> /* Assume only one pcpu needs to EOI the irq */ >> >> >> cpu = p->desc->arch.eoi_cpu; >> >> >> eoi = 1; >> >> >> pirq = p->desc->irq; >> >> >> } >> >> >> if ( !atomic_dec_and_test(&p->inflight_cnt) ) >> >> >> { >> >> >> /* Physical IRQ can't be reinject */ >> >> >> WARN_ON(p->desc != NULL); >> >> >> gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority); >> >> >> spin_unlock_irq(&gic.lock); >> >> >> i++; >> >> >> continue; >> >> >> } >> >> >> >> >> >> GICH[GICH_LR + i] = 0; >> >> >> clear_bit(i, &this_cpu(lr_mask)); >> >> >> >> >> >> if ( !list_empty(&v->arch.vgic.lr_pending) ) { >> >> >> n = list_entry(v->arch.vgic.lr_pending.next, >> >> >> typeof(*n), lr_queue); >> >> >> gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority); >> >> >> list_del_init(&n->lr_queue); >> >> >> set_bit(i, &this_cpu(lr_mask)); >> >> >> } else { >> >> >> gic_inject_irq_stop(); >> >> >> } >> >> >> spin_unlock_irq(&gic.lock); >> >> >> >> >> >> spin_lock_irq(&v->arch.vgic.lock); >> >> >> list_del_init(&p->inflight); >> >> >> spin_unlock_irq(&v->arch.vgic.lock); >> >> >> >> >> >> if ( eoi ) { >> >> >> /* this is not racy because we can't receive another >> >> >> irq of the >> >> >> * same type until we EOI it. */ >> >> >> if ( cpu == smp_processor_id() ) >> >> >> gic_irq_eoi((void*)(uintptr_t)pirq); >> >> >> else >> >> >> on_selected_cpus(cpumask_of(cpu), >> >> >> gic_irq_eoi, >> >> >> (void*)(uintptr_t)pirq, 0); >> >> >> } >> >> >> >> >> >> i++; >> >> >> } >> >> >> } >> >> >> >> >> >> >> >> >> Oleksandr Tyshchenko | Embedded Developer >> >> >> GlobalLogic >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Name | Title >> >> GlobalLogic >> >> P +x.xxx.xxx.xxxx M +x.xxx.xxx.xxxx S skype >> >> www.globallogic.com >> >> >> >> http://www.globallogic.com/email_disclaimer.txt >> >> >> >> >> >> -- >> >> Name | Title >> GlobalLogic >> P +x.xxx.xxx.xxxx M +x.xxx.xxx.xxxx S skype >> www.globallogic.com >> >> http://www.globallogic.com/email_disclaimer.txt >> -- Name | Title GlobalLogic P +x.xxx.xxx.xxxx M +x.xxx.xxx.xxxx S skype www.globallogic.com http://www.globallogic.com/email_disclaimer.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |