[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PAT-related crash booting Linux 4.4 + Xen 4.5 on VMware ESXi



On 05/24/2016 10:53 AM, Kani, Toshimitsu wrote:
> On Mon, 2016-05-23 at 15:52 -0700, Ed Swierk wrote:
>> Good question. I ran my tests again, and found I'd misinterpreted the
>> Fusion behavior.
>>
>> On Fusion 8.1.1, MSR_IA32_CR_PAT returns a reasonable value:
>>
>> (XEN) Freed 308kB init memory.
>> mapping kernel into physical memory
>> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
>> pat_init_cache_modes pat=50100070406
>> pat_init_cache_modes i=7 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=6 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=5 pat_val=5 cache=5
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=4 pat_val=1 cache=1
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=3 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=2 pat_val=7 cache=2
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=1 pat_val=4 cache=4
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=0 pat_val=6 cache=0
>> pat_init_cache_modes ok
>> pat_init_cache_modes pat_msg=WB  WT  UC- UC  WC  WP  UC  UC
>> about to get started...
>> [    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC-
>> UC  WC  WP  UC  UC
>>
>> On ESXi 5.5.0, MSR_IA32_CR_PAT returns 0, and we are indeed hitting
>> the BUG_ON in update_cache_mode_entry():
>>
>> (XEN) Freed 312kB init memory.
>> mapping kernel into physical memory
>> cpu_has_pat=0 cpuid_edx(1)=f89cbf5 pat=65536
>> pat_init_cache_modes pat=0
>> pat_init_cache_modes i=7 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=6 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=5 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=4 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=3 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=2 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=1 pat_val=0 cache=3
>> pat_init_cache_modes ok
>> pat_init_cache_modes i=0 pat_val=0 cache=3
>> (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on
>> VCPU 0 [ec=0000]
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802276c3
>> create_bounce_frame+0x12b/0x13a
>>
>> In both cases, the PAT CPUID feature bit is set, and cpu_has_pat is
>> always 0 at this early point (so my RFC patch is wrong). The simplest
>> fix is to call pat_init_cache_modes(pat) only if pat != 0.
>>
>> This is starting to look like the same logic that's in pat_bsp_init(),
>> which doesn't seem to be called when booting on Xen. Should it be? Was
>> Xen deliberately excluded from this PAT emulation change?
>> https://groups.google.com/d/msg/linux.kernel/JoJKbCOxV0U/PM0I9d1v60kJ
> Calling pat_init() requires the CPU rendezvous handler in MTRR, which is
> disabled in Xen.  This PAT initialization has been problematic, and the
> following patches addressed it in 4.6.  This will fix your problem as
> well. 
> https://lkml.org/lkml/2016/3/23/500
>
> In particular, patch 6/7 removed the Xen code in question.
> https://lkml.org/lkml/2016/3/23/503
>
> Do you need to fix this issue in 4.4?  If so, we should be able to request
> backporting the patches to 4.4 stable.


Would disabling PAT when the MSR is clearly broken (and not trying to
emulate it) not work?

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.