[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen staging-4.7] x86/spec-ctrl: Introduce options to control VERW flushing



commit b205b6969f86d1f25e152ed0880cd3ee189b9358
Author:     Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
AuthorDate: Wed Dec 12 19:22:15 2018 +0000
Commit:     Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CommitDate: Tue May 14 15:51:53 2019 +0100

    x86/spec-ctrl: Introduce options to control VERW flushing
    
    The Microarchitectural Data Sampling vulnerability is split into categories
    with subtly different properties:
    
     MLPDS - Microarchitectural Load Port Data Sampling
     MSBDS - Microarchitectural Store Buffer Data Sampling
     MFBDS - Microarchitectural Fill Buffer Data Sampling
     MDSUM - Microarchitectural Data Sampling Uncacheable Memory
    
    MDSUM is a special case of the other three, and isn't distinguished further.
    
    These issues pertain to three microarchitectural buffers.  The Load Ports, 
the
    Store Buffers and the Fill Buffers.  Each of these structures are flushed by
    the new enhanced VERW functionality, but the conditions under which flushing
    is necessary vary.
    
    For this concise overview of the issues and default logic, the abbreviations
    SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) 
are
    used for brevity:
    
     * Vulnerable hardware is divided into two categories - parts which suffer
       from SP only, and parts with any other combination of vulnerabilities.
    
     * SP only has an HT interaction when the thread goes idle, due to the 
static
       partitioning of resources.  LP and FB have HT interactions at all points,
       due to the competitive sharing of resources.  All issues potentially leak
       data across the return-to-guest transition.
    
     * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, 
so
       we don't need to do both on the HVM return-to-guest path.  However, the
       Knights range of processors are immune to L1TF (therefore have no
       MSR_FLUSH_CMD), but are vulnerable to MDS, so do require VERW on the HVM
       path.
    
    Note that we deliberately support mds=1 even without MD_CLEAR in case the
    microcode has been updated but the feature bit not exposed.
    
    This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, 
CVE-2019-11091.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
    Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
    (cherry picked from commit 3c04c258ab40405a74e194d9889a4cbc7abe94b4)
---
 docs/misc/xen-command-line.markdown |  14 ++-
 xen/arch/x86/cpu/common.c           |   9 ++
 xen/arch/x86/spec_ctrl.c            | 165 +++++++++++++++++++++++++++++++++++-
 3 files changed, 182 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 20863b2112..9c6cabcaf1 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1514,7 +1514,7 @@ is being interpreted as a custom timeout in milliseconds. 
Zero or boolean
 false disable the quirk workaround, which is also the default.
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
 >              l1d-flush}=<bool> ]`
 
@@ -1538,9 +1538,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=` and `rsb=` offer fine grained control
-over the alternative blocks used by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+grained control over the alternative blocks used by Xen.  These impact Xen's
+ability to protect itself, and Xen's ability to virtualise support for guests
+to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -1549,6 +1550,11 @@ protect itself, and Xen's ability to virtualise support 
for guests to use.
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
 * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
   Return Address Stack on entry to Xen.
+* `md-clear=` offers control over whether to use VERW to flush
+  microarchitectural buffers on idle and exit from Xen.  *Note: For
+  compatibility with development versions of this fix, `mds=` is also accepted
+  on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
+  preference to here.*
 
 If Xen was compiled with INDIRECT\_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index fc9e379096..8a3e9f1f0d 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -439,6 +439,15 @@ void identify_cpu(struct cpuinfo_x86 *c)
                if (test_bit(X86_FEATURE_SC_MSR_IDLE,
                             boot_cpu_data.x86_capability))
                        __set_bit(X86_FEATURE_SC_MSR_IDLE, c->x86_capability);
+               if (test_bit(X86_FEATURE_SC_VERW_PV,
+                            boot_cpu_data.x86_capability))
+                       __set_bit(X86_FEATURE_SC_VERW_PV, c->x86_capability);
+               if (test_bit(X86_FEATURE_SC_VERW_HVM,
+                            boot_cpu_data.x86_capability))
+                       __set_bit(X86_FEATURE_SC_VERW_HVM, c->x86_capability);
+               if (test_bit(X86_FEATURE_SC_VERW_IDLE,
+                            boot_cpu_data.x86_capability))
+                       __set_bit(X86_FEATURE_SC_VERW_IDLE, c->x86_capability);
 
                /* AND the already accumulated flags with these */
                for ( i = 0 ; i < NCAPINTS ; i++ )
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index b2d1757e8c..7d37382ab3 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -32,6 +32,8 @@ static bool_t __initdata opt_msr_sc_pv = 1;
 static bool_t __initdata opt_msr_sc_hvm = 1;
 static bool_t __initdata opt_rsb_pv = 1;
 static bool_t __initdata opt_rsb_hvm = 1;
+static int8_t __initdata opt_md_clear_pv = -1;
+static int8_t __initdata opt_md_clear_hvm = -1;
 
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
@@ -56,6 +58,9 @@ paddr_t __read_mostly l1tf_addr_mask, __read_mostly 
l1tf_safe_maddr;
 static bool_t __initdata cpu_has_bug_l1tf;
 static unsigned int __initdata l1d_maxphysaddr;
 
+static bool_t __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
+static bool_t __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS 
combination. */
+
 static int __init parse_bti(const char *s)
 {
     const char *ss;
@@ -128,6 +133,8 @@ static int __init parse_spec_ctrl(char *s)
         disable_common:
             opt_rsb_pv = 0;
             opt_rsb_hvm = 0;
+            opt_md_clear_pv = 0;
+            opt_md_clear_hvm = 0;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -150,11 +157,13 @@ static int __init parse_spec_ctrl(char *s)
         {
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
+            opt_md_clear_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
+            opt_md_clear_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
         {
@@ -166,6 +175,12 @@ static int __init parse_spec_ctrl(char *s)
             opt_rsb_pv = val;
             opt_rsb_hvm = val;
         }
+        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 ||
+                  (val = parse_boolean("mds", s, ss)) >= 0 )
+        {
+            opt_md_clear_pv = val;
+            opt_md_clear_hvm = val;
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -351,7 +366,7 @@ static void __init print_details(enum ind_thunk thunk, 
uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -361,7 +376,8 @@ static void __init print_details(enum ind_thunk thunk, 
uint64_t caps)
            !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            opt_ibpb                                  ? " IBPB"  : "",
-           opt_l1d_flush                             ? " L1D_FLUSH" : "");
+           opt_l1d_flush                             ? " L1D_FLUSH" : "",
+           opt_md_clear_pv || opt_md_clear_hvm       ? " VERW"  : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf )
@@ -744,6 +760,107 @@ static __init void l1tf_calculations(uint64_t caps)
                                             : (3ul << (paddr_bits - 2))));
 }
 
+/* Calculate whether this CPU is vulnerable to MDS. */
+static __init void mds_calculations(uint64_t caps)
+{
+    /* MDS is only known to affect Intel Family 6 processors at this time. */
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    /* Any processor advertising MDS_NO should be not vulnerable to MDS. */
+    if ( caps & ARCH_CAPS_MDS_NO )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+        /*
+         * Core processors since at least Nehalem are vulnerable.
+         */
+    case 0x1f: /* Auburndale / Havendale */
+    case 0x1e: /* Nehalem */
+    case 0x1a: /* Nehalem EP */
+    case 0x2e: /* Nehalem EX */
+    case 0x25: /* Westmere */
+    case 0x2c: /* Westmere EP */
+    case 0x2f: /* Westmere EX */
+    case 0x2a: /* SandyBridge */
+    case 0x2d: /* SandyBridge EP/EX */
+    case 0x3a: /* IvyBridge */
+    case 0x3e: /* IvyBridge EP/EX */
+    case 0x3c: /* Haswell */
+    case 0x3f: /* Haswell EX/EP */
+    case 0x45: /* Haswell D */
+    case 0x46: /* Haswell H */
+    case 0x3d: /* Broadwell */
+    case 0x47: /* Broadwell H */
+    case 0x4f: /* Broadwell EP/EX */
+    case 0x56: /* Broadwell D */
+    case 0x4e: /* Skylake M */
+    case 0x5e: /* Skylake D */
+        cpu_has_bug_mds = 1;
+        break;
+
+        /*
+         * Some Core processors have per-stepping vulnerability.
+         */
+    case 0x55: /* Skylake-X / Cascade Lake */
+        if ( boot_cpu_data.x86_mask <= 5 )
+            cpu_has_bug_mds = 1;
+        break;
+
+    case 0x8e: /* Kaby / Coffee / Whiskey Lake M */
+        if ( boot_cpu_data.x86_mask <= 0xb )
+            cpu_has_bug_mds = 1;
+        break;
+
+    case 0x9e: /* Kaby / Coffee / Whiskey Lake D */
+        if ( boot_cpu_data.x86_mask <= 0xc )
+            cpu_has_bug_mds = 1;
+        break;
+
+        /*
+         * Very old and very new Atom processors are not vulnerable.
+         */
+    case 0x1c: /* Pineview */
+    case 0x26: /* Lincroft */
+    case 0x27: /* Penwell */
+    case 0x35: /* Cloverview */
+    case 0x36: /* Cedarview */
+    case 0x7a: /* Goldmont */
+        break;
+
+        /*
+         * Middling Atom processors are vulnerable to just the Store Buffer
+         * aspect.
+         */
+    case 0x37: /* Baytrail / Valleyview (Silvermont) */
+    case 0x4a: /* Merrifield */
+    case 0x4c: /* Cherrytrail / Brasswell */
+    case 0x4d: /* Avaton / Rangely (Silvermont) */
+    case 0x5a: /* Moorefield */
+    case 0x5d:
+    case 0x65:
+    case 0x6e:
+    case 0x75:
+        /*
+         * Knights processors (which are based on the Silvermont/Airmont
+         * microarchitecture) are similarly only affected by the Store Buffer
+         * aspect.
+         */
+    case 0x57: /* Knights Landing */
+    case 0x85: /* Knights Mill */
+        cpu_has_bug_msbds_only = 1;
+        break;
+
+    default:
+        printk("Unrecognised CPU model %#x - assuming vulnerable to MDS\n",
+               boot_cpu_data.x86_model);
+        cpu_has_bug_mds = 1;
+        break;
+    }
+}
+
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
@@ -940,6 +1057,50 @@ void __init init_speculation_mitigations(void)
         printk("******************************************************\n");
     }
 
+    mds_calculations(caps);
+
+    /*
+     * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+     * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+     * but it is somewhat better than nothing.
+     */
+    if ( opt_md_clear_pv == -1 )
+        opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                           boot_cpu_has(X86_FEATURE_MD_CLEAR));
+    if ( opt_md_clear_hvm == -1 )
+        opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+                            boot_cpu_has(X86_FEATURE_MD_CLEAR));
+
+    /*
+     * Enable MDS defences as applicable.  The PV blocks need using all the
+     * time, and the Idle blocks need using if either PV or HVM defences are
+     * used.
+     *
+     * HVM is more complicated.  The MD_CLEAR microcode extends L1D_FLUSH with
+     * equivelent semantics to avoid needing to perform both flushes on the
+     * HVM path.  The HVM blocks don't need activating if our hypervisor told
+     * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
+     */
+    if ( opt_md_clear_pv )
+        __set_bit(X86_FEATURE_SC_VERW_PV, boot_cpu_data.x86_capability);
+    if ( opt_md_clear_pv || opt_md_clear_hvm )
+        __set_bit(X86_FEATURE_SC_VERW_IDLE, boot_cpu_data.x86_capability);
+    if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
+        __set_bit(X86_FEATURE_SC_VERW_HVM, boot_cpu_data.x86_capability);
+
+    /*
+     * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+     * active and no explicit SMT choice.
+     */
+    if ( opt_smt == -1 && cpu_has_bug_mds && hw_smt_enabled )
+    {
+        printk("******************************************************\n");
+        printk("Booted on MLPDS/MFBDS-vulnerable hardware with 
SMT/Hyperthreading\n");
+        printk("enabled.  Mitigations will not be fully effective.  Please\n");
+        printk("choose an explicit smt=<bool> setting.  See XSA-297.\n");
+        printk("******************************************************\n");
+    }
+
     print_details(thunk, caps);
 
     /*
--
generated by git-patchbot for /home/xen/git/xen.git#staging-4.7

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/xen-changelog

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.