Xen project Mailing List

Re: [Xen-devel] [PATCH RFC] x86/Intel: virtualize support for cpuid faulting

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 4 Oct 2016 21:49:13 -0700

Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Robert O'Callahan <robert@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Wed, 05 Oct 2016 04:49:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thanks for the reviews! On Tue, Oct 4, 2016 at 3:31 AM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 03/10/16 23:38, Kyle Huey wrote: >> rr (http://rr-project.org/), a Linux userspace record-and-replay reverse- >> execution debugger, would like to trap and emulate the CPUID instruction. >> This would allow us to a) mask away certain hardware features that rr does >> not support (e.g. RDRAND) and b) enable trace portability across machines >> by providing constant results. Patches for support in the Linux kernel are in >> flight, and we'd like to be able to use this feature on virtualized Linux >> instances as well. >> >> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated >> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no >> hardware support for faulting on cpuid is necessary to emulate support with >> an >> HVM guest. >> >> On PV guests, hardware support is required so that userspace cpuid will trap >> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests >> (that >> aren't the control domain, see the comment in intel_ctxt_switch_levelling). > > I see you found the first part of my cpuid overhaul work. > > For your reference (as it will eventually impact the code you add here), > I am currently working on the 2nd part which will clean up the myriad of > different cpuid functions we currently have. After that, I will be > arranging to include MSR levelling, at which point your cpuid_fault > parameter can be read straight out of the vcpu's MSR bank, rather than > living in a magic variable. Cool. Sounds like a nicer world :-) > MSR levelling however is a long way off and shouldn't block this > feature, so the magic boolean is ok for now. > >> Every userspace cpuid will trap via a GP(0) to emulate_privileged_op >> (via do_general_protection). Once there we can simply decline to emulate the >> cpuid and instead pass the GP(0) along to the guest kernel, thus emulating >> the cpuid faulting behavior. PV guest kernels enter pv_cpuid via a different >> path, so we do not need to check the CPL here. > > Which path is this? emulate_forced_invalid_op()? > > While typically only used by a guest kernel, it is accessable to > userspace as well, so should be treated accordingly. No, emulate_privileged_op is called only by do_general_protection (unless I'm missing something here). I see now that I'm looking at it again that 'gp_in_kernel' here refers to xen's kernel, and not the guest kernel, so this also needs to grow a CPL check. ('guest_mode' and 'toggle_guest_mode' having different concepts of 'mode' is slightly confusing). >> >> Signed-off-by: Kyle Huey <khuey@xxxxxxxxxxxx> >> --- >> xen/arch/x86/domain.c | 2 ++ >> xen/arch/x86/hvm/vmx/vmx.c | 24 +++++++++++++++++++----- >> xen/arch/x86/traps.c | 31 ++++++++++++++++++++++++++++++- >> xen/include/asm-x86/domain.h | 3 +++ >> 4 files changed, 54 insertions(+), 6 deletions(-) >> >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c >> index 3c4b094..f2bac10 100644 >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -1095,6 +1095,8 @@ int arch_set_info_guest( >> for ( i = 0; i < 8; i++ ) >> (void)set_debugreg(v, i, c(debugreg[i])); >> >> + v->arch.cpuid_fault = 0; >> + >> if ( v->is_initialised ) >> goto out; >> >> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c >> index 50cbfed..f1e04c1 100644 >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -2430,11 +2430,16 @@ static void vmx_cpuid_intercept( >> HVMTRACE_5D (CPUID, input, *eax, *ebx, *ecx, *edx); >> } >> >> -static int vmx_do_cpuid(struct cpu_user_regs *regs) >> +static int vmx_do_cpuid(struct vcpu *v, struct cpu_user_regs *regs) >> { >> unsigned int eax, ebx, ecx, edx; >> unsigned int leaf, subleaf; >> >> + if ( v->arch.cpuid_fault && !ring_0(regs) ) { > > ring_0() is not the correct check to use here. It is unsafe for > anything other than a PV guest, and I will see about removing it from > the header files to avoid future misuse. Good to know! Is ring_0 the correct thing to use above, in emulate_privileged_op (which I believe is PV only)? > CPL must always be read from %ss, not %cs, as %cs is inaccurate in real > or vm86 mode. > > Amazingly, it turns out that we don't actually have an hvm_get_cpl() > (we should fix that). Until we gain one, you want: > > struct segment_register sreg; > hvm_get_segment_register(curr, x86_seg_ss, &sreg); > > if ( v->arch.cpuid_fault && sreg.attr.fields.dpl > 0 ) > > ~Andrew Ok. Adding an hvm_get_cpl seems pretty straightforward. I'll do that. - Kyle _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.