[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
On 18/07/2016 23:03, linux@xxxxxxxxxxxxxx wrote: > On 2016-07-18 22:57, Andrew Cooper wrote: >> On 18/07/2016 20:26, Sander Eikelenboom wrote: >>> Monday, July 18, 2016, 7:48:20 PM, you wrote: >>> >>>> On 18/07/16 11:21, linux@xxxxxxxxxxxxxx wrote: >>>>> Hi Jan, >>>>> >>>>> It seems that since your patch series starting with commit: >>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration >>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 >>>>> >>>>> The shutdown of a guest which has a PCI device passed through which >>>>> uses MSI-X interrupts causes >>>>> a host crash, see the splat below. Somehow it also doesn't reboot >>>>> in 5 >>>>> seconds as it is supposed to (i don't have no-reboot on the command >>>>> line). >>>>> >>>>> -- >>>>> Sander >>>>> >>>>> >>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 >>>>> debug=y Not tainted ]---- >>>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0 >>>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] >>>>> msixtbl_pt_unregister+0x7b/0xd9 >>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: >>>>> hypervisor (d0v0) >>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: >>>>> ffff83055c685500 rcx: 0000000000000001 >>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: >>>>> 0000000000001ab0 rdi: ffff8305313b85a0 >>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: >>>>> ffff83009fd07c68 r8: ffff8305356dfff0 >>>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: >>>>> ffff830503420c50 r11: 0000000000000282 >>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: >>>>> ffff83009fd07e48 r14: ffff8305313b8000 >>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: >>>>> 0000000080050033 cr4: 00000000000006e0 >>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: >>>>> 0000000000000000 >>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: >>>>> 0000 ss: e010 cs: e008 >>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> >>>>> (msixtbl_pt_unregister+0x7b/0xd9): >>>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b >>>>> 0a 0f >>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8 >>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from >>>>> rsp=ffff83009fd07c68: >>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 >>>>> ffff83009fd07ce8 ffff82d08014c394 >>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 >>>>> 0000000000000293 ffff8305313b80cc >>>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 >>>>> ffff83009fd07cd8 ffff83009fd07e38 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 >>>>> 00007fc25a33e004 ffff8305313b8000 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 >>>>> 0000000000000000 ffff83053b1191f0 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 >>>>> ffff82d0801300ae 000000000000000e >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 >>>>> ffff83009fd07d78 000000020001d17b >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 >>>>> ffff83009fd07d68 ffff82d080130280 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa >>>>> 0000000000000202 0000000000000000 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 >>>>> 0000000000305000 00007fc25a33e004 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c >>>>> 0000000000000206 0000000000000002 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db >>>>> 0000000000000cfe 0000000000000002 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 >>>>> ffff83009fd07e48 ffff82d08019c119 >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 >>>>> ffff83009fd07e38 0000000000000cfe >>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 >>>>> 0000000c00000030 000056082bb90013 >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 >>>>> 0000305600000000 000056082b87465d >>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f >>>>> 0000000000000000 000056082b8746cf >>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 >>>>> 00007ffe26820740 000056082b8797be >>>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 >>>>> 00007ffe26820740 0000000000003056 >>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 >>>>> 00007ffe26820580 ffff88005716d320 >>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace: >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] >>>>> msixtbl_pt_unregister+0x7b/0xd9 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] >>>>> pt_irq_destroy_bind+0x2be/0x3f0 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] >>>>> arch_do_domctl+0xc77/0x2414 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] >>>>> do_domctl+0x19db/0x1d26 >>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] >>>>> lstar_enter+0xdd/0x137 >>>>> (XEN) [2016-07-16 16:03:17.070] >>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000: >>>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 >>>>> ffffffffffffffff >>>>> (XEN) [2016-07-16 16:03:18.147] >>>>> (XEN) [2016-07-16 16:03:18.155] >>>>> **************************************** >>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: >>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT >>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000] >>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: >>>>> 0000000000000000 >>>>> (XEN) [2016-07-16 16:03:18.233] >>>>> **************************************** >>>>> (XEN) [2016-07-16 16:03:18.252] >>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds... >>>>> >>>> Can you paste the disassembly of msixtbl_pt_unregister() please? That >>>> is a dereference of %rdx which is NULL at this point, but I need to >>>> figure out which pointer it is supposed to be. >>> Hi Andrew, >> >> <snip> >> >> Thanks. What has happened is that the msixtbl linked list is still >> uninitialised at this point. The only way I can see for this to happen >> is that msixtbl_init() hasn't been called, or hasn't passed its first if >> condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk >> of identified changeset is the line of code which changes the list from >> 0 to initialised, and I don't see anywhere which re-zeros it later. >> >> This alone suggests that the VM in question isn't actually using MSI-X >> interrupts, even if the device passed through is capable. > > Hmm didn't actually check this before, but you seem to be right > (below is the lspci output from within the guest). Both of those devices are using MSI interrupts - they don't even support MSI-X. > > >> Following the style of the identified changeset, >> >> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >> index e418b98..c533719 100644 >> --- a/xen/arch/x86/hvm/vmsi.c >> +++ b/xen/arch/x86/hvm/vmsi.c >> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct >> pirq *pirq) >> ASSERT(pcidevs_locked()); >> ASSERT(spin_is_locked(&d->event_lock)); >> >> - if ( !has_vlapic(d) ) >> + if ( !d->arch.hvm_domain.msixtbl_list.next ) >> return; >> >> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); >> >> should resolve your issue, although I am very tempted to replace the >> opencoded list logic with a msixtbl_initialised() predicate instead. >> >> ~Andrew > > It does resolve the issue, thanks ! Right - I will clean up the patch tomorrow using a more logical predicate. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |