[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.



On 18/07/2016 23:03, linux@xxxxxxxxxxxxxx wrote:
> On 2016-07-18 22:57, Andrew Cooper wrote:
>> On 18/07/2016 20:26, Sander Eikelenboom wrote:
>>> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>>>
>>>> On 18/07/16 11:21, linux@xxxxxxxxxxxxxx wrote:
>>>>> Hi Jan,
>>>>>
>>>>> It seems that since your patch series starting with commit:
>>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>>>
>>>>> The shutdown of a guest which has a PCI device passed through which
>>>>> uses MSI-X interrupts causes
>>>>> a host crash, see the splat below. Somehow it also doesn't reboot
>>>>> in 5
>>>>> seconds as it is supposed to (i don't have no-reboot on the command
>>>>> line).
>>>>>
>>>>> -- 
>>>>> Sander
>>>>>
>>>>>
>>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64
>>>>> debug=y  Not tainted ]----
>>>>> (XEN) [2016-07-16 16:03:17.069] CPU:    0
>>>>> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
>>>>> hypervisor (d0v0)
>>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
>>>>> ffff83055c685500   rcx: 0000000000000001
>>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
>>>>> 0000000000001ab0   rdi: ffff8305313b85a0
>>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
>>>>> ffff83009fd07c68   r8:  ffff8305356dfff0
>>>>> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
>>>>> ffff830503420c50   r11: 0000000000000282
>>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
>>>>> ffff83009fd07e48   r14: ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
>>>>> 0000000080050033   cr4: 00000000000006e0
>>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
>>>>> 0000   ss: e010   cs: e008
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>>>> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b
>>>>> 0a 0f
>>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>>>> rsp=ffff83009fd07c68:
>>>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
>>>>> ffff83009fd07ce8 ffff82d08014c394
>>>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
>>>>> 0000000000000293 ffff8305313b80cc
>>>>> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
>>>>> ffff83009fd07cd8 ffff83009fd07e38
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
>>>>> 00007fc25a33e004 ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
>>>>> 0000000000000000 ffff83053b1191f0
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
>>>>> ffff82d0801300ae 000000000000000e
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
>>>>> ffff83009fd07d78 000000020001d17b
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
>>>>> ffff83009fd07d68 ffff82d080130280
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
>>>>> 0000000000000202 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
>>>>> 0000000000305000 00007fc25a33e004
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
>>>>> 0000000000000206 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
>>>>> 0000000000000cfe 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
>>>>> ffff83009fd07e48 ffff82d08019c119
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
>>>>> ffff83009fd07e38 0000000000000cfe
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
>>>>> 0000000c00000030 000056082bb90013
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
>>>>> 0000305600000000 000056082b87465d
>>>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
>>>>> 0000000000000000 000056082b8746cf
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
>>>>> 00007ffe26820740 000056082b8797be
>>>>> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
>>>>> 00007ffe26820740 0000000000003056
>>>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
>>>>> 00007ffe26820580 ffff88005716d320
>>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
>>>>> pt_irq_destroy_bind+0x2be/0x3f0
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
>>>>> arch_do_domctl+0xc77/0x2414
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
>>>>> do_domctl+0x19db/0x1d26
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
>>>>> lstar_enter+0xdd/0x137
>>>>> (XEN) [2016-07-16 16:03:17.070]
>>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>>>> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
>>>>> ffffffffffffffff
>>>>> (XEN) [2016-07-16 16:03:18.147]
>>>>> (XEN) [2016-07-16 16:03:18.155]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:18.233]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.252]
>>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>>>
>>>> Can you paste the disassembly of msixtbl_pt_unregister() please?  That
>>>> is a dereference of %rdx which is NULL at this point, but I need to
>>>> figure out which pointer it is supposed to be.
>>> Hi Andrew,
>>
>> <snip>
>>
>> Thanks.  What has happened is that the msixtbl linked list is still
>> uninitialised at this point.  The only way I can see for this to happen
>> is that msixtbl_init() hasn't been called, or hasn't passed its first if
>> condition.  The INIT_LIST_HEAD() visible in the context of the 2nd hunk
>> of identified changeset is the line of code which changes the list from
>> 0 to initialised, and I don't see anywhere which re-zeros it later.
>>
>> This alone suggests that the VM in question isn't actually using MSI-X
>> interrupts, even if the device passed through is capable.
>
> Hmm didn't actually check this before, but you seem to be right
> (below is the lspci output from within the guest).

Both of those devices are using MSI interrupts - they don't even support
MSI-X.

>
>
>> Following the style of the identified changeset,
>>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index e418b98..c533719 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
>> pirq *pirq)
>>      ASSERT(pcidevs_locked());
>>      ASSERT(spin_is_locked(&d->event_lock));
>>
>> -    if ( !has_vlapic(d) )
>> +    if ( !d->arch.hvm_domain.msixtbl_list.next )
>>          return;
>>
>>      irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>>
>> should resolve your issue, although I am very tempted to replace the
>> opencoded list logic with a msixtbl_initialised() predicate instead.
>>
>> ~Andrew
>
> It does resolve the issue, thanks !

Right - I will clean up the patch tomorrow using a more logical predicate.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.