[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 2/2] vmx/hap: optimize CR4 trapping
On Fri, Feb 23, 2018 at 04:56:38AM +0000, Tian, Kevin wrote: > > From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx] > > Sent: Tuesday, February 20, 2018 4:57 PM > > > > There a bunch of bits in CR4 that should be allowed to be set directly > > by the guest without requiring Xen intervention, currently this is > > already done by passing through guest writes into the CR4 used when > > running in non-root mode, but taking an expensive vmexit in order to > > do so. > > > > xenalyze reports the following when running a PV guest in shim mode: > > > > CR_ACCESS 3885950 6.41s 17.04% 3957 cyc { 2361| 3378| 7920} > > cr4 3885940 6.41s 17.04% 3957 cyc { 2361| 3378| 7920} > > cr3 1 0.00s 0.00% 3480 cyc { 3480| 3480| 3480} > > *[ 0] 1 0.00s 0.00% 3480 cyc { 3480| 3480| 3480} > > cr0 7 0.00s 0.00% 7112 cyc { 3248| 5960|17480} > > clts 2 0.00s 0.00% 4588 cyc { 3456| 5720| 5720} > > > > After this change this turns into: > > > > CR_ACCESS 12 0.00s 0.00% 9972 cyc { 3680|11024|24032} > > cr4 2 0.00s 0.00% 17528 cyc {11024|24032|24032} > > cr3 1 0.00s 0.00% 3680 cyc { 3680| 3680| 3680} > > *[ 0] 1 0.00s 0.00% 3680 cyc { 3680| 3680| 3680} > > cr0 7 0.00s 0.00% 9209 cyc { 4184| 7848|17488} > > clts 2 0.00s 0.00% 8232 cyc { 5352|11112|11112} > > > > Note that this optimized trapping is currently only applied to guests > > running with HAP on Intel hardware. If using shadow paging more CR4 > > bits need to be unconditionally trapped, which makes this approach > > unlikely to yield any important performance improvements. > > > > Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > Acked-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> > > --- > > Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx> > > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > > Cc: Jan Beulich <jbeulich@xxxxxxxx> > > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> > > Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx> > > --- > > Changes since v2: > > - Use cr4_host_mask. > > > > Changes since v1: > > - Use the mask_cr variable in order to cache the cr4 mask. > > - Take into account write_ctrlreg_mask when introspection is enabled. > > --- > > xen/arch/x86/hvm/vmx/vmx.c | 39 > > +++++++++++++++++++++++++++++++++++++++ > > xen/arch/x86/hvm/vmx/vvmx.c | 2 ++ > > xen/arch/x86/monitor.c | 5 +++-- > > 3 files changed, 44 insertions(+), 2 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > > index 5cd689e823..27cbbe8823 100644 > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -1684,6 +1684,36 @@ static void vmx_update_guest_cr(struct vcpu *v, > > unsigned int cr) > > } > > > > __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]); > > + > > + if ( !paging_mode_hap(v->domain) ) > > + /* > > + * Shadow path has not been optimized because it requires > > + * unconditionally trapping more CR4 bits, at which point the > > + * performance benefit of doing this is quite dubious. > > + */ > > + v->arch.hvm_vcpu.cr4_host_mask = ~0UL; > > + else > > + { > > + /* > > + * Update CR4 host mask to only trap when the guest tries to > > set > > + * bits that are controlled by the hypervisor. > > + */ > > + v->arch.hvm_vcpu.cr4_host_mask = HVM_CR4_HOST_MASK | > > X86_CR4_PKE | > > + ~hvm_cr4_guest_valid_bits(v, > > 0); > > + v->arch.hvm_vcpu.cr4_host_mask |= v- > > >arch.hvm_vmx.vmx_realmode ? > > + X86_CR4_VME : 0; > > + v->arch.hvm_vcpu.cr4_host_mask |= !hvm_paging_enabled(v) ? > > + (X86_CR4_PSE | X86_CR4_SMEP | > > + X86_CR4_SMAP) > > + : 0; > > + if ( v->domain->arch.monitor.write_ctrlreg_enabled & > > + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4) ) > > + v->arch.hvm_vcpu.cr4_host_mask |= > > + ~v->domain- > > >arch.monitor.write_ctrlreg_mask[VM_EVENT_X86_CR4]; > > + > > + } > > + __vmwrite(CR4_GUEST_HOST_MASK, v- > > >arch.hvm_vcpu.cr4_host_mask); > > + > > what about doing a comparison to avoid vmwrite if nothing changed? I can do that, but it doesn't seem like other writes in the same function do this (see GUEST_CR4 unconditional write above for example). Is it really better from a performance PoV to avoid the write? (or take the penalty of doing a read + write?) Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |