[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi
Yes, we're just now moving to 4.4 stable, and will be there for a while, so backporting would be very helpful. --Ed On Tue, May 24, 2016 at 7:53 AM, Kani, Toshimitsu <toshi.kani@xxxxxxx> wrote: > On Mon, 2016-05-23 at 15:52 -0700, Ed Swierk wrote: >> Good question. I ran my tests again, and found I'd misinterpreted the >> Fusion behavior. >> >> On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value: >> >> (XEN) Freed 308kB init memory. >> mapping kernel into physical memory >> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536 >> pat_init_cache_modes pat=50100070406 >> pat_init_cache_modes i=7 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=6 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=5 pat_val=5 cache=5 >> pat_init_cache_modes ok >> pat_init_cache_modes i=4 pat_val=1 cache=1 >> pat_init_cache_modes ok >> pat_init_cache_modes i=3 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=2 pat_val=7 cache=2 >> pat_init_cache_modes ok >> pat_init_cache_modes i=1 pat_val=4 cache=4 >> pat_init_cache_modes ok >> pat_init_cache_modes i=0 pat_val=6 cache=0 >> pat_init_cache_modes ok >> pat_init_cache_modes pat_msg=WB WT UC- UC WC WP UC UC >> about to get started... >> [ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- >> UC WC WP UC UC >> >> On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting >> the BUG_ON in update_cache_mode_entry(): >> >> (XEN) Freed 312kB init memory. >> mapping kernel into physical memory >> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536 >> pat_init_cache_modes pat=0 >> pat_init_cache_modes i=7 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=6 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=5 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=4 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=3 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=2 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=1 pat_val=0 cache=3 >> pat_init_cache_modes ok >> pat_init_cache_modes i=0 pat_val=0 cache=3 >> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on >> VCPU 0 [ec=0000] >> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802276c3 >> create_bounce_frame+0x12b/0x13a >> >> In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is >> always 0 at this early point (so my RFC patch is wrong). The simplest >> fix is to call pat_init_cache_modes(pat) only if pat != 0. >> >> This is starting to look like the same logic that's in pat_bsp_init(), >> which doesn't seem to be called when booting on Xen. Should it be? Was >> Xen deliberately excluded from this PAT emulation change? >> https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ > > Calling pat_init() requires the CPU rendezvous handler in MTRR, which is > disabled in Xen. This PAT initialization has been problematic, and the > following patches addressed it in 4.6. This will fix your problem as > well. > https://lkml.org/lkml/2016/3/23/500 > > In particular, patch 6/7 removed the Xen code in question. > https://lkml.org/lkml/2016/3/23/503 > > Do you need to fix this issue in 4.4? If so, we should be able to request > backporting the patches to 4.4 stable. > > -Toshi > > >> >> --Ed >> >> >> On Mon, May 23, 2016 at 1:13 PM, Boris Ostrovsky >> <boris.ostrovsky@xxxxxxxxxx> wrote: >> > >> > On 05/23/2016 10:15 AM, Konrad Rzeszutek Wilk wrote: >> > > >> > > On Fri, May 20, 2016 at 04:58:09PM -0700, Ed Swierk wrote: >> > > > >> > > > (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on >> > > > VCPU 0 [ec=0000] >> > > > (XEN) domain_crash_sync called from entry.S: fault at >> > > > ffff82d0802286c3 create_bounce_frame+0x12b/0x13a >> > > > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >> > > > (XEN) ----[ Xen-4.5.4-pre x86_64 debug=n Not tainted ]---- >> > > > (XEN) CPU: 0 >> > > > (XEN) RIP: e033:[<ffffffff81053cbd>] >> > > > (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0) >> > > > (XEN) rax: 0000000000000022 rbx: 00000000ffffffff rcx: >> > > > 0000000000000000 >> > > > (XEN) rdx: 0000000000000022 rsi: 0000000000000003 rdi: >> > > > 0000000000000000 >> > > > (XEN) rbp: ffffffff81b67ea8 rsp: >> > > > ffffffff81b67e68 r8: 0000000000000001 >> > > > (XEN) r9: 0000000000000001 r10: ffffffff81b67f20 r11: >> > > > 6c61765f74617020 >> > > > (XEN) r12: 0000000000000000 r13: 0000000000000003 r14: >> > > > 0000000000000000 >> > > > (XEN) r15: ffffffff81b67ebb cr0: 000000008005003b cr4: >> > > > 00000000001526b0 >> > > > (XEN) cr3: 00000001b16eb000 cr2: 0000000000000000 >> > > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: >> > > > e033 >> > > > (XEN) Guest stack trace from rsp=ffffffff81b67e68: >> > > > (XEN) 0000000000000000 6c61765f74617020 ffffffff81053cbd >> > > > 000000010000e030 >> > > > (XEN) 0000000000010006 ffffffff81b67ea8 000000000000e02b >> > > > ffffffff81b67f20 >> > > > (XEN) ffffffff81b67f10 ffffffff8105b339 55ffffff81b67f10 >> > > > 5520204355202043 >> > > > (XEN) 5520204355202043 5520204355202043 0020204355202043 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 ffffffff81b67f38 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 ffffffff81b67ff0 ffffffff82010d0a >> > > > 0000000000000000 >> > > > (XEN) 000306f200000000 fed8320300010800 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 0000000000000000 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 ffffffff81b68008 0000000000000000 >> > > > 0000000000000000 >> > > > (XEN) 0000000000000000 0000000000000000 00000000fffedb08 >> > > > (XEN) Domain 0 crashed: rebooting machine in 5 seconds. >> > > > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. >> > > > >> > > > The crash occurs in pat_init_cache_modes(), called by >> > > > xen_start_kernel(). The pat value from MSR_IA32_CR_PAT is 0. >> > > > Strangely, the same kernel and Xen boot just fine on VMware Fusion >> > > > 8.1.1, even though the MSR is 0 there as well. >> > Are you hitting BUG_ON in update_cache_mode_entry()? I don't think I >> > can >> > see how you can avoid it when MSR read returns 0. >> > >> > >> > > >> > > > >> > > > >> > > > Anyway, guessing that it's pointless to call pat_init_cache_modes() >> > > > when the CPU doesn't support PAT, I added a check for cpu_has_pat. >> > > > This resolves the problem on ESXi and doesn't seem to break real >> > > > hardware, though I'm not sure how to verify PAT functionality. So >> > > > this is just an RFC. >> > Can you start an HVM guest in Xen after your patch below? >> > >> > > >> > > Cc-ing maintainers. >> > > > >> > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >> > > > index 9a29803..209f680 100644 >> > > > --- a/arch/x86/xen/enlighten.c >> > > > +++ b/arch/x86/xen/enlighten.c >> > > > @@ -1633,8 +1633,12 @@ asmlinkage __visible void __init >> > > > xen_start_kernel(void) >> > > > * Modify the cache mode translation tables to match Xen's PAT >> > > > * configuration. >> > > > */ >> > > > - rdmsrl(MSR_IA32_CR_PAT, pat); >> > > > - pat_init_cache_modes(pat); >> > > > + if (cpu_has_pat) { >> > > > + rdmsrl(MSR_IA32_CR_PAT, pat); >> > > > + pat_init_cache_modes(pat); >> > > > + } else { >> > > > + xen_raw_console_write("CPU does not support PAT\n"); >> > > > + } >> > > > >> > > > /* keep using Xen gdt for now; no urgent need to change it */ >> > > > >> > This looks OK to me but I think we should first understand why you >> > don't >> > crash on Fusion. >> > >> > Also, PAT initialization code has been rewritten in Linux (for 4.5?) so >> > I suspect this problem is only observed on earlier kernels. >> > >> > -boris >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |