[Xen-devel] [PATCH 00/19] MCE code cleanup and add LMCE support

This patch series adds LMCE support to Xen, although more than half
patches are for code cleanup and bug fix.

Intel Local MCE (LMCE) is a feature on Intel Skylake Server CPU that            
can deliver MCE to a single processor thread instead of broadcasting            
to all threads, which can reduce software's load when processing MCE            
on machines with a large number of processor threads.                           
The technical details of LMCE can be found in Intel SDM Vol 3, Chapter          
"Machine-Check Architecture" (search for 'LMCE'). Basically,                    
 * The capability of LMCE is indicated by bit 27 (MCG_LMCE_P) of                
 * LMCE is enabled by setting bit 20 (MSR_IA32_FEATURE_CONTROL_LMCE)            
   of MSR_IA32_FEATURE_CONTROL and bit 0 (MCG_EXT_CTL_LMCE_EN) of               
 * Software can determine if a MCE is local to the current processor            
   thread by checking bit 2 (MCG_STATUS_LMCE) of MSR_IA32_MCG_STATUS.

Patch Overview
In this patch series,
 * Xen enables LMCE by default if it's supported by host CPU unless Xen
   boot parameter "mce_fb=1" is present.
 * Xen handles LMCE only on the affected CPU and does not need all CPUs
   to enter MCE handler.
 * A new xl config "lmce=BOOLEAN" is added to control whether LMCE is
   supported for the HVM domain. It's disabled by default. If the host
   CPU does not support LMCE, this config will be ignored.
 * For HVM domain with LMCE support, if the vcpu affected by a host
   LMCE is known, Xen will inject a vLMCE to that vcpu. If the affected
   vcpu is unknown or LMCE support is disabled for a HVM domain, a MCE
   will be broadcast to all vcpus of that domain as before.  

This patch series is organized as below:
 * Patch 1 - 8 clean up existing MCE code and make one improvement to
   debugging messages. No functional change is introduced.
 * Patch 9 - 11 fix two bugs in vMCE injection and MCE handling.
 * Patch 12 & 13 add host-side LMCE support, including detecting,
   enabling LMCE feature and handling LMCE.
 * Patch 14 - 17 add guest-side LMCE support (only HVM domain so far),
   including emulating LMCE feature and injecting LMCE to HVM domain.
 * Patch 18 & 19 add xen-mceinj support to inject LMCE for test purpose.

How to Test
0. This patch series can be tested either on Intel CPU w/ LMCE support
   (Skylake-EX), or in the nested virtualization environment on
   KVM/QEMU (i.e. Xen as L1 hypervisor).

   QEMU 2.7.0 and later with KVM in Linux kernel 4.8 and later can
   emulate LMCE and do not require the host hardware support LMCE. You
   can start a nested virtualization environment with LMCE support by
   the following command:
        qemu-system-x86_64 -enable-kvm \
                           -smp 32 -cpu kvm64,lmce=on,+vmx \
                           -hda PATH-TO-DISK-IMG -m 2048
1. Build, install and boot Xen with this patch series. You can include
   "mce_verbosity=verbose" in Xen boot parameters to get more detailed
   debugging messages about MCE.

2. At boot time, if the Xen boot parameter 'mce_fb=1' is not
   present, Xen hypervisor should be able to detect and enable LMCE,
   and print the following message:

        (XEN) mce_intel.c:737: MCA Capability: BCAST 1 SER 1 CMCI 1 firstbank 0 
extended MCE MSR LMCE 1

   If 'mce_fb=1' is specified, the last segment of above message will
   be "LMCE 0" which indicates Xen does not enable LMCE support.

3. Start a HVM domain with the attached config file xl.cfg. In the
    * "lmce = 1" enables LMCE for the domaim.
    * "cpus = [ ... ]" is helpful for the following steps to figure
      out which CPU should we inject to, and may be not a necessity.

   Run Linux kernel 4.2 or later (which has LMCE support) in the

   Run the latest mcelog (https://www.mcelog.org/) in the domain as
   well to log MCEs injected in latter steps. Depending on the guest
   Linux distro, the log can be in /var/log/mcelog, syslog or systemd

   Compile and run the attached claim_page.c in the domain. claim_page.c
   allocates a page of memory, prints its base (guest) physical address
   and enters an infinite loop. For example, it may print a message like

        Physical address of mmaped page = 0x36d4d000
4. Use "xl vcpu-list" to figure out the cpu number on which
   claim_page on is running. For example, xl vcpu-list may output

        Name     ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
        lmce-l2   1     0    4   r--     546.5  4 / all
        lmce-l2   1     1    5   -b-       8.4  5 / all
        lmce-l2   1     2    6   -b-       6.4  6 / all 
        lmce-l2   1     3    7   -b-       6.4  7 / all

    As claim_page is the only workload that is actively running in
    the domain, CPU 4 (VCPU 0) is very likely the one it's running on.
    (You may even want to pin claim_page to a vcpu in guest Linux ... )

5. Use xen-mceinj to inject LMCE:
        ./xen-mceinj -c 4 -d 1 -p 0x36d4d000 -t 0 -l
                                                    inject LMCE

   If the injection succeeds, mcelog in the domain should generate the
   log like

        Hardware event. This is not a software error.
        MCE 0                                        
        CPU 0 BANK 1 TSC 2218fdf1380
             vcpu0 receives MCE
        RIP !INEXACT! 10:ffffffff810591e7            
        MISC 86 ADDR 36d4d000
                             error address
        TIME 1487302866 Fri Feb 17 11:41:06 2017     
        MCG status:RIPV MCIP LMCE
                                 LMCE is injected
        MCi status:                                  
        Uncorrected error                            
        Error enabled                                
        MCi_MISC register valid                      
        MCi_ADDR register valid                      
        MCA: Generic CACHE Level-2 Eviction Error    
        STATUS bd2000000000017a MCGSTATUS d          
        MCGCAP 9000c02 APICID 1 SOCKETID 0           
        CPUID Vendor Intel Family 6 Model 79  

Haozhong Zhang (19):
  01/19 x86/mce: fix indentation style in xen-mca.h and mce.h
  02/19 x86/mce: remove declarations of non-existing functions in mce.h
  03/19 x86/mce: remove unnecessary braces around intel_get_extended_msrs()
  04/19 xen/mce: remove unused x86_mcinfo_add()
  05/19 x86/mce: merge loops to get Intel extended MC MSR
  06/19 x86/mce: merge intel_default_mce_dhandler/uhandler()
  07/19 x86/vmce: include domain/vcpu id in debug messages
  08/19 x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()
  09/19 x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case
  10/19 x86/mce: always write 0 to MSR_IA32_MCG_STATUS on Intel CPU
  11/19 tools/xen-mceinj: fix the type of cpu number
  12/19 x86/mce: handle LMCE locally
  13/19 x86/mce_intel: detect and enable LMCE on Intel host
  14/19 x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL
  15/19 x86/vmce: emulate MSR_IA32_MCG_EXT_CTL
  16/19 x86/vmce: enable injecting LMCE to guest on Intel host
  17/19 x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
  18/19 xen/mce: add support of vLMCE injection to XEN_MC_inject_v2
  19/19 tools/xen-mceinj: support injecting LMCE

 docs/man/xl.cfg.pod.5.in                |  18 ++++
 tools/libxc/include/xenctrl.h           |   1 +
 tools/libxc/xc_misc.c                   |  25 ++++++
 tools/libxl/libxl_create.c              |   1 +
 tools/libxl/libxl_dom.c                 |   2 +
 tools/libxl/libxl_types.idl             |   1 +
 tools/libxl/xl_cmdimpl.c                |   3 +
 tools/tests/mce-test/tools/xen-mceinj.c |  70 +++++++++++++--
 xen/arch/x86/cpu/mcheck/barrier.c       |   4 +-
 xen/arch/x86/cpu/mcheck/mcaction.c      |  20 +++--
 xen/arch/x86/cpu/mcheck/mce.c           |  87 +++++++++++-------
 xen/arch/x86/cpu/mcheck/mce.h           |  51 ++++++-----
 xen/arch/x86/cpu/mcheck/mce_amd.c       |   4 +-
 xen/arch/x86/cpu/mcheck/mce_intel.c     |  86 ++++++++++--------
 xen/arch/x86/cpu/mcheck/vmce.c          | 153 ++++++++++++++++++++++++--------
 xen/arch/x86/cpu/mcheck/vmce.h          |   2 +-
 xen/arch/x86/cpu/mcheck/x86_mca.h       |   9 +-
 xen/arch/x86/hvm/hvm.c                  |   7 ++
 xen/arch/x86/hvm/vmx/vmx.c              |  10 +++
 xen/arch/x86/hvm/vmx/vvmx.c             |   4 -
 xen/include/asm-x86/mce.h               |   3 +
 xen/include/asm-x86/msr-index.h         |   2 +
 xen/include/public/arch-x86/hvm/save.h  |   2 +
 xen/include/public/arch-x86/xen-mca.h   |  25 +++---
 xen/include/public/hvm/params.h         |   5 +-
 25 files changed, 420 insertions(+), 175 deletions(-)


