[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 5/5] x86: Rework MSR_TSC_AUX handling from scratch.



There are many problems with MSR_TSC_AUX handling.

To being with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
depends on RDTSCP.

For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
PV guests, which in turn allows RDPID to work.

To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
into the generic MSR policy infrastructure, and becomes common.  One
improvement is that we will now reject invalid values, rather than silently
truncating an accepting them.  This also causes the emulator to reject RDTSCP
for guests without the features.

One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
previous behaviour of this mode was to silently drop writes, but as it is a
break in the x86 ABI to start with, switch the semantics to be more sane, and
have TSC_MODE_PVRDTSCP make the MSR properly read-only.

With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
migrating.  Cope with it the common migration logic.  Care must be taken
however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as the
receiving side will reject the guest_wrmsr().

What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
emulation code functions correctly.

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Jan Beulich <JBeulich@xxxxxxxx>
CC: Jun Nakajima <jun.nakajima@xxxxxxxxx>
CC: Kevin Tian <kevin.tian@xxxxxxxxx>
CC: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 xen/arch/x86/domain.c                       |  3 +-
 xen/arch/x86/domctl.c                       | 15 ++++++++-
 xen/arch/x86/hvm/hvm.c                      | 47 ++++++++++++++++++++---------
 xen/arch/x86/hvm/svm/svm.c                  |  4 +--
 xen/arch/x86/hvm/vmx/vmx.c                  |  4 +--
 xen/arch/x86/msr.c                          | 16 ++++++++++
 xen/arch/x86/pv/emul-inv-op.c               |  4 +--
 xen/arch/x86/pv/emul-priv-op.c              |  5 ---
 xen/arch/x86/time.c                         | 11 +++++++
 xen/arch/x86/x86_emulate/x86_emulate.c      |  1 +
 xen/include/asm-x86/hvm/hvm.h               |  6 ----
 xen/include/asm-x86/hvm/vcpu.h              |  1 -
 xen/include/asm-x86/msr.h                   |  9 ++++++
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 xen/include/public/arch-x86/hvm/save.h      |  2 ++
 xen/tools/gen-cpuid.py                      |  3 ++
 16 files changed, 96 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9c3527f..144d6f0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1534,8 +1534,7 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
         activate_debugregs(v);
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
-                      ? v->domain->arch.incarnation : 0);
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3a..979afdf 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1249,6 +1249,7 @@ long arch_do_domctl(
         static const uint32_t msrs_to_send[] = {
             MSR_SPEC_CTRL,
             MSR_INTEL_MISC_FEATURES_ENABLES,
+            MSR_TSC_AUX,
         };
         uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send);
 
@@ -1284,7 +1285,18 @@ long arch_do_domctl(
                 for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
                 {
                     uint64_t val;
-                    int rc = guest_rdmsr(v, msrs_to_send[j], &val);
+                    int rc;
+
+                    /*
+                     * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
+                     * case, the MSR is read-only, and should be rejected if
+                     * seen on the restore side.
+                     */
+                    if ( msrs_to_send[j] == MSR_TSC_AUX &&
+                         d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+                        continue;
+
+                    rc = guest_rdmsr(v, msrs_to_send[j], &val);
 
                     /*
                      * It is the programmers responsibility to ensure that
@@ -1373,6 +1385,7 @@ long arch_do_domctl(
                 {
                 case MSR_SPEC_CTRL:
                 case MSR_INTEL_MISC_FEATURES_ENABLES:
+                case MSR_TSC_AUX:
                     if ( guest_wrmsr(v, msr.index, msr.value) != X86EMUL_OKAY )
                         break;
                     continue;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0539551..ab24f87 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -792,7 +792,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 
         ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
 
-        ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
+        /* ctxt.msr_tsc_aux omitted - now sent via generic MSR record. */
 
         hvm_get_segment_register(v, x86_seg_idtr, &seg);
         ctxt.idtr_limit = seg.limit;
@@ -1046,7 +1046,24 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
     if ( hvm_funcs.tsc_scaling.setup )
         hvm_funcs.tsc_scaling.setup(v);
 
-    v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
+    /*
+     * Backwards compatibility.  MSR_TSC_AUX contains different information
+     * depending on whether TSC_MODE_PVRDTSCP is enabled.
+     *
+     * Before Xen 4.11, ctxt.msr_tsc_aux was sent unconditionally and restored
+     * here, but the value was otherwise ignored in TSC_MODE_PVRDTSCP.
+     *
+     * In 4.11, the logic was changed to send MSR_TSC_AUX via the generic MSR
+     * mechanism if tsc_mode != TSC_MODE_PVRDTSCP.  If tsc_mode ==
+     * TSC_MODE_PVRDTSCP, the tsc logic is responsibile for setting the
+     * correct value.
+     *
+     * For compatibility with migration streams from before 4.11, we restore
+     * from ctxt.msr_tsc_aux if the TSC code hasn't/isn't in charge, and we've
+     * not seen a value arrive in the generic MSR record.
+     */
+    if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP && !v->arch.msr->tsc_aux )
+        v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
 
     hvm_set_guest_tsc_fixed(v, ctxt.tsc, d->arch.hvm_domain.sync_tsc);
 
@@ -1329,6 +1346,7 @@ static int hvm_load_cpu_xsave_states(struct domain *d, 
hvm_domain_context_t *h)
 static const uint32_t msrs_to_send[] = {
     MSR_SPEC_CTRL,
     MSR_INTEL_MISC_FEATURES_ENABLES,
+    MSR_TSC_AUX,
 };
 static unsigned int __read_mostly msr_count_max = ARRAY_SIZE(msrs_to_send);
 
@@ -1351,7 +1369,18 @@ static int hvm_save_cpu_msrs(struct domain *d, 
hvm_domain_context_t *h)
         for ( i = 0; i < ARRAY_SIZE(msrs_to_send); ++i )
         {
             uint64_t val;
-            int rc = guest_rdmsr(v, msrs_to_send[i], &val);
+            int rc;
+
+            /*
+             * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this case, the
+             * MSR is read-only, and should be rejected if seen on the restore
+             * side.
+             */
+            if ( msrs_to_send[i] == MSR_TSC_AUX &&
+                 d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+                continue;
+
+            rc = guest_rdmsr(v, msrs_to_send[i], &val);
 
             /*
              * It is the programmers responsibility to ensure that
@@ -1465,6 +1494,7 @@ static int hvm_load_cpu_msrs(struct domain *d, 
hvm_domain_context_t *h)
 
         case MSR_SPEC_CTRL:
         case MSR_INTEL_MISC_FEATURES_ENABLES:
+        case MSR_TSC_AUX:
             rc = guest_wrmsr(v, ctxt->msr[i].index, ctxt->msr[i].val);
 
             if ( rc != X86EMUL_OKAY )
@@ -3424,10 +3454,6 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t 
*msr_content)
         *msr_content = v->arch.hvm_vcpu.msr_tsc_adjust;
         break;
 
-    case MSR_TSC_AUX:
-        *msr_content = hvm_msr_tsc_aux(v);
-        break;
-
     case MSR_IA32_APICBASE:
         *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
         break;
@@ -3575,13 +3601,6 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t 
msr_content,
         hvm_set_guest_tsc_adjust(v, msr_content);
         break;
 
-    case MSR_TSC_AUX:
-        v->arch.hvm_vcpu.msr_tsc_aux = (uint32_t)msr_content;
-        if ( cpu_has_rdtscp
-             && (v->domain->arch.tsc_mode != TSC_MODE_PVRDTSCP) )
-            wrmsr_tsc_aux(msr_content);
-        break;
-
     case MSR_IA32_APICBASE:
         if ( !vlapic_msr_set(vcpu_vlapic(v), msr_content) )
             goto gp_fault;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index f53f430..9406624 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1093,7 +1093,7 @@ static void svm_ctxt_switch_to(struct vcpu *v)
     svm_tsc_ratio_load(v);
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 static void noreturn svm_do_resume(struct vcpu *v)
@@ -2842,7 +2842,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
         break;
 
     case VMEXIT_RDTSCP:
-        regs->rcx = hvm_msr_tsc_aux(v);
+        regs->rcx = v->arch.msr->tsc_aux;
         /* fall through */
     case VMEXIT_RDTSC:
         svm_vmexit_do_rdtsc(regs);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 31acb0e..45fd9c5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -618,7 +618,7 @@ static void vmx_restore_guest_msrs(struct vcpu *v)
     }
 
     if ( cpu_has_rdtscp )
-        wrmsr_tsc_aux(hvm_msr_tsc_aux(v));
+        wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
@@ -3858,7 +3858,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_invlpg_intercept(exit_qualification);
         break;
     case EXIT_REASON_RDTSCP:
-        regs->rcx = hvm_msr_tsc_aux(v);
+        regs->rcx = v->arch.msr->tsc_aux;
         /* fall through */
     case EXIT_REASON_RDTSC:
         update_guest_eip(); /* Safe: RDTSC, RDTSCP */
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 7ba9a10..aab0722 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -156,6 +156,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
uint64_t *val)
                _MSR_MISC_FEATURES_CPUID_FAULTING;
         break;
 
+    case MSR_TSC_AUX:
+        if ( !cp->extd.rdtscp )
+            goto gp_fault;
+
+        *val = vp->tsc_aux;
+        break;
+
     default:
         return X86EMUL_UNHANDLEABLE;
     }
@@ -231,6 +238,15 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
         break;
     }
 
+    case MSR_TSC_AUX:
+        if ( !cp->extd.rdtscp ||                      /* MSR available? */
+             d->arch.tsc_mode == TSC_MODE_PVRDTSCP || /* MSR read-only? */
+             val != (uint32_t)val )                   /* Rsvd bits set? */
+            goto gp_fault;
+
+        wrmsr_tsc_aux(vp->tsc_aux = val);
+        break;
+
     default:
         return X86EMUL_UNHANDLEABLE;
     }
diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
index b1916b4..c071372 100644
--- a/xen/arch/x86/pv/emul-inv-op.c
+++ b/xen/arch/x86/pv/emul-inv-op.c
@@ -46,7 +46,6 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     char opcode[3];
     unsigned long eip, rc;
     struct vcpu *v = current;
-    struct domain *currd = v->domain;
 
     eip = regs->rip;
     if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
@@ -59,8 +58,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     eip += sizeof(opcode);
 
     msr_split(regs, pv_soft_rdtsc(v, regs));
-    regs->rcx = ((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
-                 ? currd->arch.incarnation : 0);
+    regs->rcx = v->arch.msr->tsc_aux;
 
     pv_emul_instruction_done(regs, eip);
     return EXCRET_fault_fixed;
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 4e3641d..e9e2313 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -880,11 +880,6 @@ static int read_msr(unsigned int reg, uint64_t *val,
         *val = currd->arch.vtsc ? pv_soft_rdtsc(curr, ctxt->regs) : rdtsc();
         return X86EMUL_OKAY;
 
-    case MSR_TSC_AUX:
-        *val = (uint32_t)((currd->arch.tsc_mode == TSC_MODE_PVRDTSCP)
-                          ? currd->arch.incarnation : 0);
-        return X86EMUL_OKAY;
-
     case MSR_EFER:
         *val = read_efer();
         if ( is_pv_32bit_domain(currd) )
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index c4ca515..7119d65 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
         }
         break;
     }
+
     d->arch.incarnation = incarnation + 1;
+
+    if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+    {
+        struct vcpu *v;
+
+        /* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
+        for_each_vcpu ( d, v )
+            v->arch.msr->tsc_aux = d->arch.incarnation;
+    }
+
     if ( is_hvm_domain(d) )
     {
         if ( hvm_tsc_scaling_supported && !d->arch.vtsc )
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c 
b/xen/arch/x86/x86_emulate/x86_emulate.c
index 85383ea..efb6003 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5141,6 +5141,7 @@ x86_emulate(
 
         case 0xf9: /* rdtscp */
             fail_if(ops->read_msr == NULL);
+            /* Getting a result implies vcpu_has_rdtscp() */
             if ( (rc = ops->read_msr(MSR_TSC_AUX,
                                      &msr_val, ctxt)) != X86EMUL_OKAY )
                 goto done;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dd3dd5f..7708fdf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -511,12 +511,6 @@ static inline void hvm_invalidate_regs_fields(struct 
cpu_user_regs *regs)
 int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
                               struct npfec npfec);
 
-#define hvm_msr_tsc_aux(v) ({                                               \
-    struct domain *__d = (v)->domain;                                       \
-    (__d->arch.tsc_mode == TSC_MODE_PVRDTSCP)                               \
-        ? (u32)__d->arch.incarnation : (u32)(v)->arch.hvm_vcpu.msr_tsc_aux; \
-})
-
 int hvm_x2apic_msr_read(struct vcpu *v, unsigned int msr, uint64_t 
*msr_content);
 int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t 
msr_content);
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index d93166f..60425f3 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -170,7 +170,6 @@ struct hvm_vcpu {
 
     struct hvm_vcpu_asid n1asid;
 
-    u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
     u64                 msr_xss;
 
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 9475a71..485113f 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -250,6 +250,15 @@ struct msr_vcpu_policy
         bool available; /* This MSR is non-architectural */
         bool cpuid_faulting;
     } misc_features_enables;
+
+    /*
+     * 0xc0000103 - MSR_TSC_AUX
+     *
+     * Usually guest chosen.  However, when using TSC_MODE_PVRDTSCP the value
+     * is Xen-controlled and is the migration incarnation, at which point the
+     * MSR becomes read-only to the guest.
+     */
+    uint32_t tsc_aux;
 };
 
 void init_guest_msr_policy(void);
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index fa81af1..78fa466 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -156,7 +156,7 @@ XEN_CPUFEATURE(NX,            2*32+20) /*A  Execute Disable 
*/
 XEN_CPUFEATURE(MMXEXT,        2*32+22) /*A  AMD MMX extensions */
 XEN_CPUFEATURE(FFXSR,         2*32+25) /*A  FFXSR instruction optimizations */
 XEN_CPUFEATURE(PAGE1GB,       2*32+26) /*H  1Gb large page support */
-XEN_CPUFEATURE(RDTSCP,        2*32+27) /*S  RDTSCP */
+XEN_CPUFEATURE(RDTSCP,        2*32+27) /*A  RDTSCP */
 XEN_CPUFEATURE(LM,            2*32+29) /*A  Long Mode (x86-64) */
 XEN_CPUFEATURE(3DNOWEXT,      2*32+30) /*A  AMD 3DNow! extensions */
 XEN_CPUFEATURE(3DNOW,         2*32+31) /*A  3DNow! */
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index 4691d4d..1c1ba4e 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -141,6 +141,8 @@ struct hvm_hw_cpu {
     uint64_t msr_cstar;
     uint64_t msr_syscall_mask;
     uint64_t msr_efer;
+
+    /* Superceeded by general MSR policy in Xen 4.11.  Written as 0. */
     uint64_t msr_tsc_aux;
 
     /* guest's idea of what rdtsc() would return */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 613b909..601a7a0 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -235,6 +235,9 @@ def crunch_numbers(state):
         # absence of any enabled xstate.
         AVX: [FMA, FMA4, F16C, AVX2, XOP],
 
+        # MSR_TSC_AUX is enumerated by RDTSCP, but RDPID also reads TSC_AUX.
+        RDTSCP: [RDPID],
+
         # CX16 is only encodable in Long Mode.  LAHF_LM indicates that the
         # SAHF/LAHF instructions are reintroduced in Long Mode.  1GB
         # superpages, PCID and PKU are only available in 4 level paging.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.