[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
On 18/07/2016 20:26, Sander Eikelenboom wrote: > Monday, July 18, 2016, 7:48:20 PM, you wrote: > >> On 18/07/16 11:21, linux@xxxxxxxxxxxxxx wrote: >>> Hi Jan, >>> >>> It seems that since your patch series starting with commit: >>> 2016-06-22 x86/vMSI-X: defer intercept handler registration >>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 >>> >>> The shutdown of a guest which has a PCI device passed through which >>> uses MSI-X interrupts causes >>> a host crash, see the splat below. Somehow it also doesn't reboot in 5 >>> seconds as it is supposed to (i don't have no-reboot on the command >>> line). >>> >>> -- >>> Sander >>> >>> >>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 >>> debug=y Not tainted ]---- >>> (XEN) [2016-07-16 16:03:17.069] CPU: 0 >>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] >>> msixtbl_pt_unregister+0x7b/0xd9 >>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: >>> hypervisor (d0v0) >>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: >>> ffff83055c685500 rcx: 0000000000000001 >>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: >>> 0000000000001ab0 rdi: ffff8305313b85a0 >>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: >>> ffff83009fd07c68 r8: ffff8305356dfff0 >>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: >>> ffff830503420c50 r11: 0000000000000282 >>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: >>> ffff83009fd07e48 r14: ffff8305313b8000 >>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: >>> 0000000080050033 cr4: 00000000000006e0 >>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: >>> 0000000000000000 >>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: >>> 0000 ss: e010 cs: e008 >>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> >>> (msixtbl_pt_unregister+0x7b/0xd9): >>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f >>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 >>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from >>> rsp=ffff83009fd07c68: >>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 >>> ffff83009fd07ce8 ffff82d08014c394 >>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 >>> 0000000000000293 ffff8305313b80cc >>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 >>> ffff83009fd07cd8 ffff83009fd07e38 >>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 >>> 00007fc25a33e004 ffff8305313b8000 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 >>> 0000000000000000 ffff83053b1191f0 >>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 >>> ffff82d0801300ae 000000000000000e >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 >>> ffff83009fd07d78 000000020001d17b >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 >>> ffff83009fd07d68 ffff82d080130280 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa >>> 0000000000000202 0000000000000000 >>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 >>> 0000000000305000 00007fc25a33e004 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c >>> 0000000000000206 0000000000000002 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db >>> 0000000000000cfe 0000000000000002 >>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 >>> ffff83009fd07e48 ffff82d08019c119 >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 >>> ffff83009fd07e38 0000000000000cfe >>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 >>> 0000000c00000030 000056082bb90013 >>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 >>> 0000305600000000 000056082b87465d >>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f >>> 0000000000000000 000056082b8746cf >>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 >>> 00007ffe26820740 000056082b8797be >>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 >>> 00007ffe26820740 0000000000003056 >>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 >>> 00007ffe26820580 ffff88005716d320 >>> (XEN) [2016-07-16 16:03:17.070] Xen call trace: >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] >>> msixtbl_pt_unregister+0x7b/0xd9 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] >>> pt_irq_destroy_bind+0x2be/0x3f0 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] >>> arch_do_domctl+0xc77/0x2414 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] >>> do_domctl+0x19db/0x1d26 >>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] >>> lstar_enter+0xdd/0x137 >>> (XEN) [2016-07-16 16:03:17.070] >>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: >>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 >>> ffffffffffffffff >>> (XEN) [2016-07-16 16:03:18.147] >>> (XEN) [2016-07-16 16:03:18.155] **************************************** >>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: >>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT >>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000] >>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000 >>> (XEN) [2016-07-16 16:03:18.233] **************************************** >>> (XEN) [2016-07-16 16:03:18.252] >>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... >>> >> Can you paste the disassembly of msixtbl_pt_unregister() please? That >> is a dereference of %rdx which is NULL at this point, but I need to >> figure out which pointer it is supposed to be. > Hi Andrew, <snip> Thanks. What has happened is that the msixtbl linked list is still uninitialised at this point. The only way I can see for this to happen is that msixtbl_init() hasn't been called, or hasn't passed its first if condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk of identified changeset is the line of code which changes the list from 0 to initialised, and I don't see anywhere which re-zeros it later. This alone suggests that the VM in question isn't actually using MSI-X interrupts, even if the device passed through is capable. Following the style of the identified changeset, andrewcoop@andrewcoop:/local/xen.git/xen$ git diff diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index e418b98..c533719 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) ASSERT(pcidevs_locked()); ASSERT(spin_is_locked(&d->event_lock)); - if ( !has_vlapic(d) ) + if ( !d->arch.hvm_domain.msixtbl_list.next ) return; irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); should resolve your issue, although I am very tempted to replace the opencoded list logic with a msixtbl_initialised() predicate instead. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |