[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
On 2016-07-18 22:57, Andrew Cooper wrote: On 18/07/2016 20:26, Sander Eikelenboom wrote:Monday, July 18, 2016, 7:48:20 PM, you wrote:On 18/07/16 11:21, linux@xxxxxxxxxxxxxx wrote:Can you paste the disassembly of msixtbl_pt_unregister() please? ThatHi Jan, It seems that since your patch series starting with commit: 2016-06-22 x86/vMSI-X: defer intercept handler registration 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798 The shutdown of a guest which has a PCI device passed through which uses MSI-X interrupts causesa host crash, see the splat below. Somehow it also doesn't reboot in 5seconds as it is supposed to (i don't have no-reboot on the command line). -- Sander (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 debug=y Not tainted ]---- (XEN) [2016-07-16 16:03:17.069] CPU: 0 (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>] msixtbl_pt_unregister+0x7b/0xd9 (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT: hypervisor (d0v0) (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx: ffff83055c685500 rcx: 0000000000000001 (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi: 0000000000001ab0 rdi: ffff8305313b85a0 (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp: ffff83009fd07c68 r8: ffff8305356dfff0 (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10: ffff830503420c50 r11: 0000000000000282 (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13: ffff83009fd07e48 r14: ffff8305313b8000 (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0: 0000000080050033 cr4: 00000000000006e0 (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2: 0000000000000000 (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> (msixtbl_pt_unregister+0x7b/0xd9):(XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f18 09 48 39 fa 75 ec 48 8d 7b 24 e8 (XEN) [2016-07-16 16:03:17.069] Xen stack trace from rsp=ffff83009fd07c68: (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480 ffff83009fd07ce8 ffff82d08014c394 (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480 0000000000000293 ffff8305313b80cc (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000 ffff83009fd07cd8 ffff83009fd07e38 (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000 00007fc25a33e004 ffff8305313b8000 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8 0000000000000000 ffff83053b1191f0 (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28 ffff82d0801300ae 000000000000000e (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497 ffff83009fd07d78 000000020001d17b (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000 ffff83009fd07d68 ffff82d080130280 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa 0000000000000202 0000000000000000 (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320 0000000000305000 00007fc25a33e004 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c 0000000000000206 0000000000000002 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db 0000000000000cfe 0000000000000002 (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000 ffff83009fd07e48 ffff82d08019c119 (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177 ffff83009fd07e38 0000000000000cfe (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206 0000000c00000030 000056082bb90013 (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013 0000305600000000 000056082b87465d (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f 0000000000000000 000056082b8746cf (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730 00007ffe26820740 000056082b8797be (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772 00007ffe26820740 0000000000003056 (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000 00007ffe26820580 ffff88005716d320 (XEN) [2016-07-16 16:03:17.070] Xen call trace: (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>] msixtbl_pt_unregister+0x7b/0xd9 (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>] pt_irq_destroy_bind+0x2be/0x3f0 (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>] arch_do_domctl+0xc77/0x2414 (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>] do_domctl+0x19db/0x1d26 (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>] lstar_enter+0xdd/0x137 (XEN) [2016-07-16 16:03:17.070](XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:(XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000 ffffffffffffffff (XEN) [2016-07-16 16:03:18.147](XEN) [2016-07-16 16:03:18.155] ****************************************(XEN) [2016-07-16 16:03:18.175] Panic on CPU 0: (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT (XEN) [2016-07-16 16:03:18.200] [error_code=0000](XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000 (XEN) [2016-07-16 16:03:18.233] ****************************************(XEN) [2016-07-16 16:03:18.252] (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...is a dereference of %rdx which is NULL at this point, but I need to figure out which pointer it is supposed to be.Hi Andrew,<snip> Thanks. What has happened is that the msixtbl linked list is still uninitialised at this point. The only way I can see for this to happenis that msixtbl_init() hasn't been called, or hasn't passed its first ifcondition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk of identified changeset is the line of code which changes the list from 0 to initialised, and I don't see anywhere which re-zeros it later. This alone suggests that the VM in question isn't actually using MSI-X interrupts, even if the device passed through is capable. Hmm didn't actually check this before, but you seem to be right (below is the lspci output from within the guest). Following the style of the identified changeset, andrewcoop@andrewcoop:/local/xen.git/xen$ git diff diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index e418b98..c533719 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq) ASSERT(pcidevs_locked()); ASSERT(spin_is_locked(&d->event_lock)); - if ( !has_vlapic(d) ) + if ( !d->arch.hvm_domain.msixtbl_list.next ) return; irq_desc = pirq_spin_lock_irq_desc(pirq, NULL); should resolve your issue, although I am very tempted to replace the opencoded list logic with a msixtbl_initialised() predicate instead. ~Andrew It does resolve the issue, thanks ! -- Sander00:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 6570/7570/8550] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Turks PRO [Radeon HD 6570/7570/8550] Physical Slot: 5Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 68 Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at f3060000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at c100 [size=256] Expansion ROM at f3080000 [disabled] [size=128K] Capabilities: [50] Power Management version 3Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-LnkCap: Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee57000 Data: 4300 Kernel driver in use: radeon00:06.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks/Whistler HDMI Audio [Radeon HD 6000 Series] Subsystem: PC Partner Limited / Sapphire Technology Turks/Whistler HDMI Audio [Radeon HD 6000 Series] Physical Slot: 6Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 79 Region 0: Memory at f30b0000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 3Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-LnkCap: Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee56000 Data: 4300 Kernel driver in use: snd_hda_intel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |