[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [V10 PATCH 23/23] PVH xen: introduce vmexit handler for PVH



On Thu, 25 Jul 2013 17:28:40 +0100
Tim Deegan <tim@xxxxxxx> wrote:

> At 18:59 -0700 on 23 Jul (1374605971), Mukesh Rathor wrote:
> > +/* Just like HVM, PVH should be using "cpuid" from the kernel
> > mode. */ +static int vmxit_invalid_op(struct cpu_user_regs *regs)
> > +{
> > +    if ( guest_kernel_mode(current, regs)
> > || !emulate_forced_invalid_op(regs) )
> > +        hvm_inject_hw_exception(TRAP_invalid_op,
> > HVM_DELIVER_NO_ERROR_CODE);
> 
> Was this discussed before?  It seems harsh to stop kernel-mode code
> from using the pv cpuid operation if it wants to.  In particular,
> what about loadable kernel modules?

Yes, few times on the xen mailing list. The only PVH guest, linux
as of now, the pv ops got rewired to use native cpuid, which is
how hvm does it. So, couldn't come up with any real reason
to support it. The kernel modules in pv ops will go thru native_cpuid
too, which will do hvm cpuid too.


> If you do go with this restriction, please document it in
> include/public/arch-x86/xen.h beside the XEN_CPUID definition.

Ok, I'll add it there.

> > +/* Returns: rc == 0: success. */
> > +static int access_cr0(struct cpu_user_regs *regs, uint acc_typ,
> > uint64_t *regp) +{
> > +    struct vcpu *vp = current;
> > +
> > +    if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR )
> > +    {
> > +        unsigned long new_cr0 = *regp;
> > +        unsigned long old_cr0 = __vmread(GUEST_CR0);
> > +
> > +        dbgp1("PVH:writing to CR0. RIP:%lx val:0x%lx\n",
> > regs->rip, *regp);
> > +        if ( (u32)new_cr0 != new_cr0 )
> > +        {
> > +            printk(XENLOG_G_WARNING
> > +                   "Guest setting upper 32 bits in CR0: %lx",
> > new_cr0);
> > +            return -EPERM;
> 
> AFAICS returning non-zero here crashes the guest.  Shouldn't this
> inject #GP instead?

Right, GPF it is.

.....

> > +
> > +        if ( new & HVM_CR4_GUEST_RESERVED_BITS(vp) )
> > +        {
> > +            printk(XENLOG_G_WARNING
> > +                   "PVH guest attempts to set reserved bit in CR4:
> > %lx", new);
> > +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > +            return 0;
> > +        }
> > +
> > +        if ( !(new & X86_CR4_PAE) && hvm_long_mode_enabled(vp) )
> > +        {
> > +            printk(XENLOG_G_WARNING "Guest cleared CR4.PAE while "
> > +                   "EFER.LMA is set");
> > +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > +            return 0;
> > +        }
> > +
> > +        vp->arch.hvm_vcpu.guest_cr[4] = new;
> > +
> > +        if ( (old_val ^ new) & (X86_CR4_PSE | X86_CR4_PGE |
> > X86_CR4_PAE) )
> > +            vpid_sync_all();
> > +
> > +        __vmwrite(CR4_READ_SHADOW, new);
> > +
> > +        new &= ~X86_CR4_PAE;         /* PVH always runs with hap
> > enabled. */
> 
> The equivalent mask in vmx_update_guest_cr() is masking out a default
> setting of CR4.PAE _before_ the guest's requested bits get ORred in.
> This is masking out the PAE bit that we just insisted on.  I'm
> surprised that VMENTER doesn't choke on this -- I guess it uses
> VM_ENTRY_IA32E_MODE rather than looking at these bits at all.

Ah, I see. what a mess! Hmm... I can't find much in the VMX sections
of the SDM on this. Not sure what I should do here. How about just
not clearing the X86_CR4_PAE, so the GUEST_CR4 will have it
if it's set in new, ie, the guest wants it set? ie, :
...
        __vmwrite(CR4_READ_SHADOW, new);

        new |= X86_CR4_VMXE | X86_CR4_MCE;
        __vmwrite(GUEST_CR4, new);
...


> > +        new |= X86_CR4_VMXE | X86_CR4_MCE;
> > +        __vmwrite(GUEST_CR4, new);
> > +    }
> > +    else
> > +        *regp = __vmread(CR4_READ_SHADOW);
> > +
> > +    return 0;
> > +}
> > +
> > +/* Returns: rc == 0: success, else -errno. */
> > +static int vmxit_cr_access(struct cpu_user_regs *regs)
> > +{
> > +    unsigned long exit_qualification =
> > __vmread(EXIT_QUALIFICATION);
> > +    uint acc_typ = VMX_CONTROL_REG_ACCESS_TYPE(exit_qualification);
> > +    int cr, rc = -EINVAL;
> > +
> > +    switch ( acc_typ )
> > +    {
> > +    case VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR:
> > +    case VMX_CONTROL_REG_ACCESS_TYPE_MOV_FROM_CR:
> > +    {
> > +        uint gpr = VMX_CONTROL_REG_ACCESS_GPR(exit_qualification);
> > +        uint64_t *regp = decode_register(gpr, regs, 0);
> > +        cr = VMX_CONTROL_REG_ACCESS_NUM(exit_qualification);
> > +
> > +        if ( regp == NULL )
> > +            break;
> > +
> > +        switch ( cr )
> > +        {
> > +        case 0:
> > +            rc = access_cr0(regs, acc_typ, regp);
> > +            break;
> > +
> > +        case 3:
> > +            printk(XENLOG_G_ERR "PVH: unexpected cr3 vmexit.
> > rip:%lx\n",
> > +                   regs->rip);
> > +            domain_crash(current->domain);
> > +            break;
> > +
> > +        case 4:
> > +            rc = access_cr4(regs, acc_typ, regp);
> > +            break;
> > +        }
> > +        if ( rc == 0 )
> > +            vmx_update_guest_eip();
> > +        break;
> > +    }
> > +
> > +    case VMX_CONTROL_REG_ACCESS_TYPE_CLTS:
> > +    {
> > +        struct vcpu *vp = current;
> > +        unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] &
> > ~X86_CR0_TS;
> > +        vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0]
> > = cr0; +
> > +        vmx_fpu_enter(vp);
> > +        __vmwrite(GUEST_CR0, cr0);
> > +        __vmwrite(CR0_READ_SHADOW, cr0);
> > +        vmx_update_guest_eip();
> > +        rc = 0;
> > +    }
> > +    }
> 
> No "case VMX_CONTROL_REG_ACCESS_TYPE_LMSW"?

Well, PVH is such a new concept, do you really think we need it? Lets
not hold the series just for this, we can always add in future :).

Thanks,
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.