[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] x86: use VMLOAD for PV context switch
>>> On 25.09.18 at 18:14, <andrew.cooper3@xxxxxxxxxx> wrote: > On 18/09/18 13:28, Jan Beulich wrote: >> @@ -1281,11 +1282,35 @@ static void load_segments(struct vcpu *n >> struct cpu_user_regs *uregs = &n->arch.user_regs; >> int all_segs_okay = 1; >> unsigned int dirty_segment_mask, cpu = smp_processor_id(); >> + bool fs_gs_done = false; >> >> /* Load and clear the dirty segment mask. */ >> dirty_segment_mask = per_cpu(dirty_segment_mask, cpu); >> per_cpu(dirty_segment_mask, cpu) = 0; >> >> +#ifdef CONFIG_HVM >> + if ( !is_pv_32bit_vcpu(n) && !cpu_has_fsgsbase && cpu_has_svm && >> + !((uregs->fs | uregs->gs) & ~3) && >> + /* >> + * The remaining part is just for optimization: If only shadow GS >> + * needs loading, there's nothing to be gained here. > > VMLOAD also loads LDT, and LLDT is fully serialising, so an even heavier > perf hit than wrmsr. I don't understand how your remark relates to the comment, or ... >> + */ >> + (n->arch.pv.fs_base | n->arch.pv.gs_base_user) ) ... the commented code. There's nothing LDT-ish here. Or are you meaning to suggest something LDT-ish should be added? I'd rather not, as the common case (afaict) is for there to be no LDT. I also don't understand the "even heavier" part - WRMSR (for the MSRs in question) is as serializing as is LLDT, isn't it? >> + { >> + fs_gs_done = n->arch.flags & TF_kernel_mode >> + ? svm_load_segs(n->arch.pv.ldt_ents, LDT_VIRT_START(n), >> + uregs->fs, n->arch.pv.fs_base, >> + uregs->gs, n->arch.pv.gs_base_kernel, >> + n->arch.pv.gs_base_user) >> + : svm_load_segs(n->arch.pv.ldt_ents, LDT_VIRT_START(n), >> + uregs->fs, n->arch.pv.fs_base, >> + uregs->gs, n->arch.pv.gs_base_user, >> + n->arch.pv.gs_base_kernel); > > This looks like a confusing way of writing: > > { > unsigned long gsb = n->arch.flags & TF_kernel_mode > ? n->arch.pv.gs_base_kernel : n->arch.pv.gs_base_user; > unsigned long gss = n->arch.flags & TF_kernel_mode > ? n->arch.pv.gs_base_user : n->arch.pv.gs_base_kernel; > > fs_gs_done = svm_load_segs(n->arch.pv.ldt_ents, LDT_VIRT_START(n), > uregs->fs, n->arch.pv.fs_base, > uregs->gs, gsb, gss); > } > > > AFAICT? "Confusing" is a very subjective term. I specifically wanted to avoid the double identical conditional. But I don't mind re-writing it as you suggest. >> @@ -1653,6 +1678,12 @@ static void __context_switch(void) >> >> write_ptbase(n); >> >> +#if defined(CONFIG_PV) && defined(CONFIG_HVM) > > From a comments in code point of view, this is the most useful location > to have something along the lines of: > > /* Prefetch the VMCB if we expect to use it later in the context switch */ Added. >> + if ( is_pv_domain(nd) && !is_pv_32bit_domain(nd) && !is_idle_domain(nd) >> && >> + !cpu_has_fsgsbase && cpu_has_svm ) >> + svm_load_segs(0, 0, 0, 0, 0, 0, 0); >> +#endif >> + >> if ( need_full_gdt(nd) && >> ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) ) >> { >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -1630,6 +1646,66 @@ static void svm_init_erratum_383(const s >> } >> } >> >> +#ifdef CONFIG_PV >> +bool svm_load_segs(unsigned int ldt_ents, unsigned long ldt_base, >> + unsigned int fs_sel, unsigned long fs_base, >> + unsigned int gs_sel, unsigned long gs_base, >> + unsigned long gs_shadow) >> +{ >> + unsigned int cpu = smp_processor_id(); >> + struct vmcb_struct *vmcb = per_cpu(host_vmcb_va, cpu); >> + >> + if ( unlikely(!vmcb) ) >> + return false; > > When can this error path ever be taken? __map_domain_page_global() may fail during initialization, which is non-fatal (and may even have worked for some CPUs, but not for others). >> + >> + if ( !ldt_base ) >> + { >> + /* >> + * The actual structure field used here was arbitrarily chosen. >> + * Empirically it doesn't seem to matter much which element is used, >> + * and a clear explanation of the otherwise poor performance has not >> + * been found/provided so far. >> + */ >> + asm volatile ( "prefetch %0" :: "m" (vmcb->ldtr) ); > > prefetchw(), which already exists and is used. I've specifically decided against using it, as we don't mean to write any part of the VMCB. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |