[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/viridian: Add Partition Reference Time enlightenment
> -----Original Message----- > From: Egger, Christoph [mailto:chegger@xxxxxxxxx] > Sent: 13 October 2014 09:54 > To: Paul Durrant; xen-devel@xxxxxxxxxxxxx > Cc: Ian Jackson; Keir (Xen.org); Ian Campbell; Jan Beulich; Stefano Stabellini > Subject: Re: [Xen-devel] [PATCH v2] x86/viridian: Add Partition Reference > Time enlightenment > > On 10.10.14 13:55, Egger, Christoph wrote: > > On 29.09.14 12:28, Paul Durrant wrote: > >> The presence of the partition reference time enlightenment persuades > newer > >> versions of Windows to prefer the TSC as their primary time source. > Hence, > >> if rdtsc is not being emulated and is invariant then many vmexits (for > >> alternative time sources such as the HPET or reference counter MSR) can > >> be avoided. > >> > >> The implementation is not yet complete as no attempt is made to prevent > >> emulation of rdtsc if the enlightenment is active and guest and host > >> TSC frequencies differ. To do that requires invasive changes in the core > >> x86 time code and hence a lot more testing. > >> > >> This patch avoids the issue by disabling the enlightenment if rdtsc is > >> being emulated, causing Windows to choose another time source. This is > >> safe, but may cause a big variation in performance of guests migrated > >> between hosts of differing TSC frequency. Thus the enlightenment is not > >> enabled in the default set, but may be enabled to improve guest > performance > >> where such migrations are not a concern. > >> > >> See section 15.4 of the Microsoft Hypervisor Top Level Functional > >> Specification v4.0a for details. > >> > >> Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> > >> Cc: Keir Fraser <keir@xxxxxxx> > >> Cc: Jan Beulich <jbeulich@xxxxxxxx> > >> Cc: Ian Campbell <ian.campbell@xxxxxxxxxx> > >> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > >> Cc: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> > >> Cc: Christoph Egger <chegger@xxxxxxxxx> > > > > Reviewed-by: Christoph Egger <chegger@xxxxxxxxx> > > I found one issue in update_reference_tsc(). See below. > > Christoph > > > > > Christoph > > > >> --- > >> v2: - Addressed comments from Jan: > >> - Leave enlightenment out of default set and add a comment > explaining > >> why. > >> - De-dup code surrounding adjustment of sequence number on > resume. > >> - Remove accidental code deletion. > >> > >> docs/man/xl.cfg.pod.5 | 6 ++ > >> tools/libxl/libxl_dom.c | 3 + > >> tools/libxl/libxl_types.idl | 1 + > >> xen/arch/x86/hvm/viridian.c | 113 > +++++++++++++++++++++++++++++++- > >> xen/include/asm-x86/hvm/viridian.h | 35 ++++++++++ > >> xen/include/public/arch-x86/hvm/save.h | 11 ++++ > >> xen/include/public/hvm/params.h | 7 +- > >> 7 files changed, 174 insertions(+), 2 deletions(-) > >> > >> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > >> index 8bba21c..925adbb 100644 > >> --- a/docs/man/xl.cfg.pod.5 > >> +++ b/docs/man/xl.cfg.pod.5 > >> @@ -1163,6 +1163,12 @@ This group incorporates Partition Time > Reference Counter MSR. This > >> enlightenment can improve performance of Windows 8 and Windows > >> Server 2012 onwards. > >> > >> +=item B<reference_tsc> > >> + > >> +This set incorporates the Partition Reference TSC MSR. This > >> +enlightenment can improve performance of Windows 7 and Windows > >> +Server 2008 R2 onwards. > >> + > >> =item B<defaults> > >> > >> This is a special value that enables the default set of groups, which > >> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > >> index bd21841..d04fc0d 100644 > >> --- a/tools/libxl/libxl_dom.c > >> +++ b/tools/libxl/libxl_dom.c > >> @@ -262,6 +262,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, > uint32_t domid, > >> if (libxl_bitmap_test(&enlightenments, > LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT)) > >> mask |= HVMPV_time_ref_count; > >> > >> + if (libxl_bitmap_test(&enlightenments, > LIBXL_VIRIDIAN_ENLIGHTENMENT_REFERENCE_TSC)) > >> + mask |= HVMPV_reference_tsc; > >> + > >> if (mask != 0 && > >> xc_hvm_param_set(CTX->xch, > >> domid, > >> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > >> index 90d152f..da75a40 100644 > >> --- a/tools/libxl/libxl_types.idl > >> +++ b/tools/libxl/libxl_types.idl > >> @@ -183,6 +183,7 @@ libxl_viridian_enlightenment = > Enumeration("viridian_enlightenment", [ > >> (0, "base"), > >> (1, "freq"), > >> (2, "time_ref_count"), > >> + (3, "reference_tsc"), > >> ]) > >> > >> # > >> diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c > >> index 6726168..a7d3283 100644 > >> --- a/xen/arch/x86/hvm/viridian.c > >> +++ b/xen/arch/x86/hvm/viridian.c > >> @@ -21,6 +21,7 @@ > >> #define VIRIDIAN_MSR_HYPERCALL 0x40000001 > >> #define VIRIDIAN_MSR_VP_INDEX 0x40000002 > >> #define VIRIDIAN_MSR_TIME_REF_COUNT 0x40000020 > >> +#define VIRIDIAN_MSR_REFERENCE_TSC 0x40000021 > >> #define VIRIDIAN_MSR_TSC_FREQUENCY 0x40000022 > >> #define VIRIDIAN_MSR_APIC_FREQUENCY 0x40000023 > >> #define VIRIDIAN_MSR_EOI 0x40000070 > >> @@ -40,6 +41,7 @@ > >> #define CPUID3A_MSR_APIC_ACCESS (1 << 4) > >> #define CPUID3A_MSR_HYPERCALL (1 << 5) > >> #define CPUID3A_MSR_VP_INDEX (1 << 6) > >> +#define CPUID3A_MSR_REFERENCE_TSC (1 << 9) > >> #define CPUID3A_MSR_FREQ (1 << 11) > >> > >> /* Viridian CPUID 4000004, Implementation Recommendations. */ > >> @@ -95,6 +97,8 @@ int cpuid_viridian_leaves(unsigned int leaf, unsigned > int *eax, > >> *eax |= CPUID3A_MSR_FREQ; > >> if ( viridian_feature_mask(d) & HVMPV_time_ref_count ) > >> *eax |= CPUID3A_MSR_TIME_REF_COUNT; > >> + if ( viridian_feature_mask(d) & HVMPV_reference_tsc ) > >> + *eax |= CPUID3A_MSR_REFERENCE_TSC; > >> break; > >> case 4: > >> /* Recommended hypercall usage. */ > >> @@ -155,6 +159,17 @@ static void dump_apic_assist(const struct vcpu > *v) > >> v, aa->fields.enabled, (unsigned long)aa->fields.pfn); > >> } > >> > >> +static void dump_reference_tsc(const struct domain *d) > >> +{ > >> + const union viridian_reference_tsc *rt; > >> + > >> + rt = &d->arch.hvm_domain.viridian.reference_tsc; > >> + > >> + printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: enabled: > %x pfn: %lx\n", > >> + d->domain_id, > >> + rt->fields.enabled, (unsigned long)rt->fields.pfn); > >> +} > >> + > >> static void enable_hypercall_page(struct domain *d) > >> { > >> unsigned long gmfn = d- > >arch.hvm_domain.viridian.hypercall_gpa.fields.pfn; > >> @@ -224,6 +239,78 @@ static void initialize_apic_assist(struct vcpu *v) > >> put_page_and_type(page); > >> } > >> > >> +static void update_reference_tsc(struct domain *d, bool_t initialize) > >> +{ > >> + unsigned long gmfn = d- > >arch.hvm_domain.viridian.reference_tsc.fields.pfn; > >> + struct page_info *page = get_page_from_gfn(d, gmfn, NULL, > P2M_ALLOC); > >> + HV_REFERENCE_TSC_PAGE *p; > >> + > >> + if ( !page || !get_page_type(page, PGT_writable_page) ) > >> + { > >> + if ( page ) > >> + put_page(page); > >> + gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, > >> + page ? page_to_mfn(page) : INVALID_MFN); > >> + return; > >> + } > >> + > >> + p = __map_domain_page(page); > >> + > >> + if ( initialize ) > >> + clear_page(p); > >> + > >> + /* > >> + * This enlightenment must be disabled is the host TSC is not > >> invariant. > >> + * However it is also disabled if vtsc is true (which means rdtsc is > >> being > >> + * emulated). This generally happens when guest TSC freq and host > TSC freq > >> + * don't match. The TscScale value could be adjusted to cope with > >> this, > >> + * allowing vtsc to be turned off, but support for this is not yet > >> present > >> + * in the hypervisor. Thus is it is possible that migrating a Windows > >> VM > >> + * between hosts of differing TSC frequencies may result in large > >> + * differences in guest performance. > >> + */ > >> + if ( !host_tsc_is_safe() || d->arch.vtsc ) > >> + { > >> + /* > >> + * The specification states that valid values of TscSequence range > >> + * from 0 to 0xFFFFFFFE. The value 0xFFFFFFFF is used to indicate > >> + * this mechanism is no longer a reliable source of time and that > >> + * the VM should fall back to a different source. > >> + * > >> + * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually > >> violate > >> + * the spec. and rely on a value of 0 to indicate that this > >> + * enlightenment should no longer be used. These two kernel > >> + * versions are currently the only ones to make use of this > >> + * enlightenment, so just use 0 here. > >> + */ > >> + p->TscSequence = 0; > >> + > >> + printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC: > invalidated\n", > >> + d->domain_id); > >> + return; > >> + } > >> + > >> + /* > >> + * The guest will calculate reference time according to the following > >> + * formula: > >> + * > >> + * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset > >> + * > >> + * Windows uses a 100ns tick, so we need a scale which is cpu > >> + * ticks per 100ns shifted left by 64. > >> + */ > >> + p->TscScale = ((10000ul << 32) / d->arch.tsc_khz) << 32; > >> + > >> + do { > >> + p->TscSequence++; > >> + } while ( p->TscSequence == 0xFFFFFFFF || > >> + p->TscSequence == 0 ); /* Avoid both 'invalid' values */ > > This is reading guest memory so a malicious guest can spin writing 0 and > cause a DoS. > Good point. That could only happen on the initial setting up of the page though, as the domain is paused in the other case. There's already a flag that indicates the initial case (since that clears the page). In this case the loop could simply be avoided and a value of 1 written in. Paul > Better do here: > > p->TscSequence++; > if ( p->TscSequence == 0xFFFFFFFF || p->TscSequence == 0 ) > p->TscSequence = 1; > > Christoph > > >> + > >> + unmap_domain_page(p); > >> + > >> + put_page_and_type(page); > >> +} > >> + > >> int wrmsr_viridian_regs(uint32_t idx, uint64_t val) > >> { > >> struct vcpu *v = current; > >> @@ -282,6 +369,17 @@ int wrmsr_viridian_regs(uint32_t idx, uint64_t > val) > >> initialize_apic_assist(v); > >> break; > >> > >> + case VIRIDIAN_MSR_REFERENCE_TSC: > >> + if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) ) > >> + return 0; > >> + > >> + perfc_incr(mshv_wrmsr_tsc_msr); > >> + d->arch.hvm_domain.viridian.reference_tsc.raw = val; > >> + dump_reference_tsc(d); > >> + if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled ) > >> + update_reference_tsc(d, 1); > >> + break; > >> + > >> default: > >> return 0; > >> } > >> @@ -346,6 +444,14 @@ int rdmsr_viridian_regs(uint32_t idx, uint64_t > *val) > >> *val = v->arch.hvm_vcpu.viridian.apic_assist.raw; > >> break; > >> > >> + case VIRIDIAN_MSR_REFERENCE_TSC: > >> + if ( !(viridian_feature_mask(d) & HVMPV_reference_tsc) ) > >> + return 0; > >> + > >> + perfc_incr(mshv_rdmsr_tsc_msr); > >> + *val = d->arch.hvm_domain.viridian.reference_tsc.raw; > >> + break; > >> + > >> case VIRIDIAN_MSR_TIME_REF_COUNT: > >> { > >> uint64_t tsc; > >> @@ -452,6 +558,7 @@ static int viridian_save_domain_ctxt(struct domain > *d, hvm_domain_context_t *h) > >> > >> ctxt.hypercall_gpa = d->arch.hvm_domain.viridian.hypercall_gpa.raw; > >> ctxt.guest_os_id = d->arch.hvm_domain.viridian.guest_os_id.raw; > >> + ctxt.reference_tsc = d->arch.hvm_domain.viridian.reference_tsc.raw; > >> > >> return (hvm_save_entry(VIRIDIAN_DOMAIN, 0, h, &ctxt) != 0); > >> } > >> @@ -460,11 +567,15 @@ static int viridian_load_domain_ctxt(struct > domain *d, hvm_domain_context_t *h) > >> { > >> struct hvm_viridian_domain_context ctxt; > >> > >> - if ( hvm_load_entry(VIRIDIAN_DOMAIN, h, &ctxt) != 0 ) > >> + if ( hvm_load_entry_zeroextend(VIRIDIAN_DOMAIN, h, &ctxt) != 0 ) > >> return -EINVAL; > >> > >> d->arch.hvm_domain.viridian.hypercall_gpa.raw = ctxt.hypercall_gpa; > >> d->arch.hvm_domain.viridian.guest_os_id.raw = ctxt.guest_os_id; > >> + d->arch.hvm_domain.viridian.reference_tsc.raw = ctxt.reference_tsc; > >> + > >> + if ( d->arch.hvm_domain.viridian.reference_tsc.fields.enabled ) > >> + update_reference_tsc(d, 0); > >> > >> return 0; > >> } > >> diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm- > x86/hvm/viridian.h > >> index 496da33..35a22f5 100644 > >> --- a/xen/include/asm-x86/hvm/viridian.h > >> +++ b/xen/include/asm-x86/hvm/viridian.h > >> @@ -48,10 +48,35 @@ union viridian_hypercall_gpa > >> } fields; > >> }; > >> > >> +union viridian_reference_tsc > >> +{ > >> + uint64_t raw; > >> + struct > >> + { > >> + uint64_t enabled:1; > >> + uint64_t reserved_preserved:11; > >> + uint64_t pfn:48; > >> + } fields; > >> +}; > >> + > >> +/* > >> + * Type defintion as in Microsoft Hypervisor Top-Level Functional > >> + * Specification v4.0a, section 15.4.2. > >> + */ > >> +typedef struct _HV_REFERENCE_TSC_PAGE > >> +{ > >> + uint32_t TscSequence; > >> + uint32_t Reserved1; > >> + uint64_t TscScale; > >> + int64_t TscOffset; > >> + uint64_t Reserved2[509]; > >> +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE; > >> + > >> struct viridian_domain > >> { > >> union viridian_guest_os_id guest_os_id; > >> union viridian_hypercall_gpa hypercall_gpa; > >> + union viridian_reference_tsc reference_tsc; > >> }; > >> > >> int > >> @@ -76,3 +101,13 @@ int > >> viridian_hypercall(struct cpu_user_regs *regs); > >> > >> #endif /* __ASM_X86_HVM_VIRIDIAN_H__ */ > >> + > >> +/* > >> + * Local variables: > >> + * mode: C > >> + * c-file-style: "BSD" > >> + * c-basic-offset: 4 > >> + * tab-width: 4 > >> + * indent-tabs-mode: nil > >> + * End: > >> + */ > >> diff --git a/xen/include/public/arch-x86/hvm/save.h > b/xen/include/public/arch-x86/hvm/save.h > >> index 16d85a3..006924b 100644 > >> --- a/xen/include/public/arch-x86/hvm/save.h > >> +++ b/xen/include/public/arch-x86/hvm/save.h > >> @@ -568,6 +568,7 @@ struct hvm_hw_cpu_xsave { > >> struct hvm_viridian_domain_context { > >> uint64_t hypercall_gpa; > >> uint64_t guest_os_id; > >> + uint64_t reference_tsc; > >> }; > >> > >> DECLARE_HVM_SAVE_TYPE(VIRIDIAN_DOMAIN, 15, struct > hvm_viridian_domain_context); > >> @@ -616,3 +617,13 @@ struct hvm_msr { > >> #define HVM_SAVE_CODE_MAX 20 > >> > >> #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */ > >> + > >> +/* > >> + * Local variables: > >> + * mode: C > >> + * c-file-style: "BSD" > >> + * c-basic-offset: 4 > >> + * tab-width: 4 > >> + * indent-tabs-mode: nil > >> + * End: > >> + */ > >> diff --git a/xen/include/public/hvm/params.h > b/xen/include/public/hvm/params.h > >> index 3c51072..a2d43bc 100644 > >> --- a/xen/include/public/hvm/params.h > >> +++ b/xen/include/public/hvm/params.h > >> @@ -92,10 +92,15 @@ > >> #define _HVMPV_time_ref_count 2 > >> #define HVMPV_time_ref_count (1 << _HVMPV_time_ref_count) > >> > >> +/* Enable Reference TSC Page (HV_X64_MSR_REFERENCE_TSC) */ > >> +#define _HVMPV_reference_tsc 3 > >> +#define HVMPV_reference_tsc (1 << _HVMPV_reference_tsc) > >> + > >> #define HVMPV_feature_mask \ > >> (HVMPV_base_freq | \ > >> HVMPV_no_freq | \ > >> - HVMPV_time_ref_count) > >> + HVMPV_time_ref_count | \ > >> + HVMPV_reference_tsc) > >> > >> #endif > >> > >> > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxx > > http://lists.xen.org/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |