[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls



On 20.11.2020 09:28, Roger Pau Monné wrote:
> On Fri, Nov 20, 2020 at 09:09:51AM +0100, Jan Beulich wrote:
>> On 19.11.2020 18:57, Manuel Bouyer wrote:
>>> I added an ASSERT() after the printf to ket a stack trace, and got:
>>> db{0}> call ioapic_dump_raw^M
>>> Register dump of ioapic0^M
>>> [  13.0193374] 00 08000000 00170011 08000000(XEN) vioapic.c:141:d0v0 
>>> apic_mem_readl:undefined ioregsel 3
>>> (XEN) vioapic.c:512:vioapic_irq_positive_edge: vioapic_deliver 2
>>> (XEN) Assertion '!print' failed at vioapic.c:512
>>> (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:   C   ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d0402c4164>] 
>>> vioapic_irq_positive_edge+0x14e/0x150
>>> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v0)
>>> (XEN) rax: ffff82d0405c806c   rbx: ffff830836650580   rcx: 0000000000000000
>>> (XEN) rdx: ffff8300688bffff   rsi: 000000000000000a   rdi: ffff82d0404b36b8
>>> (XEN) rbp: ffff8300688bfde0   rsp: ffff8300688bfdc0   r8:  0000000000000004
>>> (XEN) r9:  0000000000000032   r10: 0000000000000000   r11: 00000000fffffffd
>>> (XEN) r12: ffff8308366dc000   r13: 0000000000000022   r14: ffff8308366dc31c
>>> (XEN) r15: ffff8308366d1d80   cr0: 0000000080050033   cr4: 00000000003526e0
>>> (XEN) cr3: 00000008366c9000   cr2: 0000000000000000
>>> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen code around <ffff82d0402c4164> 
>>> (vioapic_irq_positive_edge+0x14e/0x150):
>>> (XEN)  3d 10 be 1d 00 00 74 c2 <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 
>>> 53 48
>>> (XEN) Xen stack trace from rsp=ffff8300688bfdc0:
>>> (XEN)    0000000200000086 ffff8308366dc000 0000000000000022 0000000000000000
>>> (XEN)    ffff8300688bfe08 ffff82d0402bcc33 ffff8308366dc000 0000000000000022
>>> (XEN)    0000000000000001 ffff8300688bfe40 ffff82d0402bd18f ffff830835a7eb98
>>> (XEN)    ffff8308366dc000 ffff830835a7eb40 ffff8300688bfe68 0100100100100100
>>> (XEN)    ffff8300688bfea0 ffff82d04026f6e1 ffff830835a7eb30 ffff8308366dc0f4
>>> (XEN)    ffff830835a7eb40 ffff8300688bfe68 ffff8300688bfe68 ffff82d0405cec80
>>> (XEN)    ffffffffffffffff ffff82d0405cec80 0000000000000000 ffff82d0405d6c80
>>> (XEN)    ffff8300688bfed8 ffff82d04022b6fa ffff83083663f000 ffff83083663f000
>>> (XEN)    0000000000000000 0000000000000000 0000000a7c62165b ffff8300688bfee8
>>> (XEN)    ffff82d04022b798 ffff8300688bfe08 ffff82d0402a4bcb 0000000000000000
>>> (XEN)    0000000000000206 ffff8316da86e61c ffff8316da86e600 ffff938031fd47c0
>>> (XEN)    0000000000000003 0000000000000400 ff889e8da08f928a 0000000000000000
>>> (XEN)    0000000000000002 0000000000000100 000000000000b86e ffff93803237f010
>>> (XEN)    0000000000000000 ffff8316da86e61c 0000beef0000beef ffffffff80555918
>>> (XEN)    000000bf0000beef 0000000000000046 ffff938031fd4790 000000000000beef
>>> (XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
>>> (XEN)    0000e01000000000 ffff83083663f000 0000000000000000 00000000003526e0
>>> (XEN)    0000000000000000 0000000000000000 0000060100000001 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d0402c4164>] R vioapic_irq_positive_edge+0x14e/0x150
>>> (XEN)    [<ffff82d0402bcc33>] F arch/x86/hvm/irq.c#assert_gsi+0x5e/0x7b
>>> (XEN)    [<ffff82d0402bd18f>] F hvm_gsi_assert+0x62/0x77
>>> (XEN)    [<ffff82d04026f6e1>] F 
>>> drivers/passthrough/io.c#dpci_softirq+0x261/0x29e
>>> (XEN)    [<ffff82d04022b6fa>] F common/softirq.c#__do_softirq+0x8a/0xbf
>>> (XEN)    [<ffff82d04022b798>] F do_softirq+0x13/0x15
>>> (XEN)    [<ffff82d0402a4bcb>] F vmx_asm_do_vmentry+0x2b/0x30
>>> (XEN) 
>>> (XEN) 
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion '!print' failed at vioapic.c:512
>>> (XEN) ****************************************
>>
>> Right, this was the expected path after what you've sent prior to this.
>> Which turned my attention back to the 'i' debug key output you had sent
>> the other day. There we have
>>
>> (XEN)    IRQ:  34 vec:51 IO-APIC-level   status=010 aff:{0}/{0-7} 
>> in-flight=1 d0: 34(-MM)
>>
>> i.e. at that point we're waiting for Dom0 to signal it's done handling
>> the IRQ. There is, however, a timer associated with this. Yet that's
>> actually to prevent the system getting stuck, i.e. the "in-flight"
>> state ought to clear 1ms later (when that timer expires), and hence
>> ought to be pretty unlikely to catch when non-zero _and_ something's
>> actually stuck.
> 
> I somehow assumed the interrupt was in-flight because the printing to
> the Xen console caused one to be injected, and thus dom0 didn't had
> time to Ack it yet.

By "injected" you mean from Xen into Dom0, or by the hardware for Xen
to handle? (I ask because I think I saw you use the term also for the
latter case, in some context.) If the former, then something would
need to have caused Xen to inject it, while in the latter case there
would need to have been a reason that it didn't get delivered earlier.

>From the stack trace above the only possibility I could derive for
now would be that we didn't run softirqs for a long time, but I don't
think that's very realistic here. Otoh, Manuel, does the NMI watchdog
work on that system? It certainly wouldn't hurt if you turned it on,
just in case.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.