Xen project Mailing List

Re: [Xen-devel] [V10 PATCH 23/23] PVH xen: introduce vmexit handler for PVH

To: Tim Deegan <tim@xxxxxxx>

From: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>

Date: Thu, 25 Jul 2013 19:30:57 -0700

Cc: Xen-devel@xxxxxxxxxxxxxxxxxxx, keir.xen@xxxxxxxxx

Delivery-date: Fri, 26 Jul 2013 02:31:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 25 Jul 2013 17:28:40 +0100 Tim Deegan <tim@xxxxxxx> wrote: > At 18:59 -0700 on 23 Jul (1374605971), Mukesh Rathor wrote: > > +/* Just like HVM, PVH should be using "cpuid" from the kernel > > mode. */ +static int vmxit_invalid_op(struct cpu_user_regs *regs) > > +{ > > + if ( guest_kernel_mode(current, regs) > > || !emulate_forced_invalid_op(regs) ) > > + hvm_inject_hw_exception(TRAP_invalid_op, > > HVM_DELIVER_NO_ERROR_CODE); > > Was this discussed before? It seems harsh to stop kernel-mode code > from using the pv cpuid operation if it wants to. In particular, > what about loadable kernel modules? Yes, few times on the xen mailing list. The only PVH guest, linux as of now, the pv ops got rewired to use native cpuid, which is how hvm does it. So, couldn't come up with any real reason to support it. The kernel modules in pv ops will go thru native_cpuid too, which will do hvm cpuid too. > If you do go with this restriction, please document it in > include/public/arch-x86/xen.h beside the XEN_CPUID definition. Ok, I'll add it there. > > +/* Returns: rc == 0: success. */ > > +static int access_cr0(struct cpu_user_regs *regs, uint acc_typ, > > uint64_t *regp) +{ > > + struct vcpu *vp = current; > > + > > + if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR ) > > + { > > + unsigned long new_cr0 = *regp; > > + unsigned long old_cr0 = __vmread(GUEST_CR0); > > + > > + dbgp1("PVH:writing to CR0. RIP:%lx val:0x%lx\n", > > regs->rip, *regp); > > + if ( (u32)new_cr0 != new_cr0 ) > > + { > > + printk(XENLOG_G_WARNING > > + "Guest setting upper 32 bits in CR0: %lx", > > new_cr0); > > + return -EPERM; > > AFAICS returning non-zero here crashes the guest. Shouldn't this > inject #GP instead? Right, GPF it is. ..... > > + > > + if ( new & HVM_CR4_GUEST_RESERVED_BITS(vp) ) > > + { > > + printk(XENLOG_G_WARNING > > + "PVH guest attempts to set reserved bit in CR4: > > %lx", new); > > + hvm_inject_hw_exception(TRAP_gp_fault, 0); > > + return 0; > > + } > > + > > + if ( !(new & X86_CR4_PAE) && hvm_long_mode_enabled(vp) ) > > + { > > + printk(XENLOG_G_WARNING "Guest cleared CR4.PAE while " > > + "EFER.LMA is set"); > > + hvm_inject_hw_exception(TRAP_gp_fault, 0); > > + return 0; > > + } > > + > > + vp->arch.hvm_vcpu.guest_cr[4] = new; > > + > > + if ( (old_val ^ new) & (X86_CR4_PSE | X86_CR4_PGE | > > X86_CR4_PAE) ) > > + vpid_sync_all(); > > + > > + __vmwrite(CR4_READ_SHADOW, new); > > + > > + new &= ~X86_CR4_PAE; /* PVH always runs with hap > > enabled. */ > > The equivalent mask in vmx_update_guest_cr() is masking out a default > setting of CR4.PAE _before_ the guest's requested bits get ORred in. > This is masking out the PAE bit that we just insisted on. I'm > surprised that VMENTER doesn't choke on this -- I guess it uses > VM_ENTRY_IA32E_MODE rather than looking at these bits at all. Ah, I see. what a mess! Hmm... I can't find much in the VMX sections of the SDM on this. Not sure what I should do here. How about just not clearing the X86_CR4_PAE, so the GUEST_CR4 will have it if it's set in new, ie, the guest wants it set? ie, : ... __vmwrite(CR4_READ_SHADOW, new); new |= X86_CR4_VMXE | X86_CR4_MCE; __vmwrite(GUEST_CR4, new); ... > > + new |= X86_CR4_VMXE | X86_CR4_MCE; > > + __vmwrite(GUEST_CR4, new); > > + } > > + else > > + *regp = __vmread(CR4_READ_SHADOW); > > + > > + return 0; > > +} > > + > > +/* Returns: rc == 0: success, else -errno. */ > > +static int vmxit_cr_access(struct cpu_user_regs *regs) > > +{ > > + unsigned long exit_qualification = > > __vmread(EXIT_QUALIFICATION); > > + uint acc_typ = VMX_CONTROL_REG_ACCESS_TYPE(exit_qualification); > > + int cr, rc = -EINVAL; > > + > > + switch ( acc_typ ) > > + { > > + case VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR: > > + case VMX_CONTROL_REG_ACCESS_TYPE_MOV_FROM_CR: > > + { > > + uint gpr = VMX_CONTROL_REG_ACCESS_GPR(exit_qualification); > > + uint64_t *regp = decode_register(gpr, regs, 0); > > + cr = VMX_CONTROL_REG_ACCESS_NUM(exit_qualification); > > + > > + if ( regp == NULL ) > > + break; > > + > > + switch ( cr ) > > + { > > + case 0: > > + rc = access_cr0(regs, acc_typ, regp); > > + break; > > + > > + case 3: > > + printk(XENLOG_G_ERR "PVH: unexpected cr3 vmexit. > > rip:%lx\n", > > + regs->rip); > > + domain_crash(current->domain); > > + break; > > + > > + case 4: > > + rc = access_cr4(regs, acc_typ, regp); > > + break; > > + } > > + if ( rc == 0 ) > > + vmx_update_guest_eip(); > > + break; > > + } > > + > > + case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: > > + { > > + struct vcpu *vp = current; > > + unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] & > > ~X86_CR0_TS; > > + vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] > > = cr0; + > > + vmx_fpu_enter(vp); > > + __vmwrite(GUEST_CR0, cr0); > > + __vmwrite(CR0_READ_SHADOW, cr0); > > + vmx_update_guest_eip(); > > + rc = 0; > > + } > > + } > > No "case VMX_CONTROL_REG_ACCESS_TYPE_LMSW"? Well, PVH is such a new concept, do you really think we need it? Lets not hold the series just for this, we can always add in future :). Thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.