[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [V11 PATCH 17/21] PVH xen: vmcs related changes



>>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote:
> This patch contains vmcs changes related for PVH, mainly creating a VMCS
> for PVH guest.
> 
> Changes in V11:
>    - Remove pvh_construct_vmcs and make it part of construct_vmcs
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
> Acked-by: Keir Fraser <keir@xxxxxxx>
> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Same comment as for #9 - too many changes to retain the tags.

> @@ -874,16 +935,37 @@ static int construct_vmcs(struct vcpu *v)
>      if ( d->arch.vtsc )
>          v->arch.hvm_vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
>  
> -    v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
> +    if ( is_pvh_vcpu(v) )
> +    {
> +        /* Phase I PVH: we run with minimal secondary exec features */
> +        u32 bits = SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID;
>  
> -    /* Disable VPID for now: we decide when to enable it on VMENTER. */
> -    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
> +        /* Intel SDM: resvd bits are 0 */
> +        v->arch.hvm_vmx.secondary_exec_control = bits;
> +    }
> +    else
> +    {
> +        v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
> +
> +        /* Disable VPID for now: we decide when to enable it on VMENTER. */
> +        v->arch.hvm_vmx.secondary_exec_control &= 
> ~SECONDARY_EXEC_ENABLE_VPID;
> +    }

So this difference in VPID handling needs some explanation, as I
don't immediately see how you adjust the code referred to by the
non-PVH comment above.

>      if ( paging_mode_hap(d) )
>      {
>          v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
>                                            CPU_BASED_CR3_LOAD_EXITING |
>                                            CPU_BASED_CR3_STORE_EXITING);
> +        if ( is_pvh_vcpu(v) )
> +        {
> +            u32 bits = CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
> +                       CPU_BASED_ACTIVATE_MSR_BITMAP;
> +            v->arch.hvm_vmx.exec_control |= bits;
> +
> +            bits = CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_TPR_SHADOW |
> +                   CPU_BASED_VIRTUAL_NMI_PENDING;
> +            v->arch.hvm_vmx.exec_control &= ~bits;
> +        }

Did you notice that the original adjustments were truly HAP-related?
Putting your PVH code here just because it takes HAP as a prereq is
not the way to go. Wherever the flags you want set get set for the
HVM case, you should add your adjustments (if any are necessary
at all).

> @@ -905,9 +987,22 @@ static int construct_vmcs(struct vcpu *v)
>  
>      vmx_update_cpu_exec_control(v);
>      __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl);
> +
> +    if ( is_pvh_vcpu(v) )
> +    {
> +        /*
> +         * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which 
> means
> +         * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and
> +         * not use the host values. 0 would cause it to not use the VMCS 
> values.
> +         */
> +        vmentry_ctl &= ~(VM_ENTRY_LOAD_GUEST_EFER | VM_ENTRY_SMM |
> +                         VM_ENTRY_DEACT_DUAL_MONITOR);
> +        /* PVH 32bitfixme. */
> +        vmentry_ctl |= VM_ENTRY_IA32E_MODE;   /* GUEST_EFER.LME/LMA ignored 
> */
> +    }
>      __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl);

This is misplaced too - we're past the point of determining the set of
flags, and already in the process of committing them. The code
again should go alongside where the corresponding HVM code sits.

> -    if ( cpu_has_vmx_ple )
> +    if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) )
>      {
>          __vmwrite(PLE_GAP, ple_gap);
>          __vmwrite(PLE_WINDOW, ple_window);

Why would this be conditional upon !PVH?

> @@ -921,28 +1016,46 @@ static int construct_vmcs(struct vcpu *v)
>      if ( cpu_has_vmx_msr_bitmap )
>      {
>          unsigned long *msr_bitmap = alloc_xenheap_page();
> +        int msr_type = MSR_TYPE_R | MSR_TYPE_W;
>  
>          if ( msr_bitmap == NULL )
> +        {
> +            vmx_vmcs_exit(v);
>              return -ENOMEM;
> +        }

This is a genuine bug fix, which should be contributed separately
(so it becomes backportable).

> -        vmx_disable_intercept_for_msr(v, MSR_FS_BASE, MSR_TYPE_R | 
> MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_GS_BASE, MSR_TYPE_R | 
> MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, MSR_TYPE_R | 
> MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, MSR_TYPE_R | 
> MSR_TYPE_W);
> -        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, MSR_TYPE_R | 
> MSR_TYPE_W);
> +        /* Disable interecepts for MSRs that have corresponding VMCS fields. 
> */
> +        vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type);
> +        vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type);
> +
>          if ( cpu_has_vmx_pat && paging_mode_hap(d) )
> -            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, MSR_TYPE_R | 
> MSR_TYPE_W);
> +            vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type);

These changes all look pointless; if you really think they're worthwhile
cleanup, they don't belong here.

> -    if ( cpu_has_vmx_virtual_intr_delivery )
> +    if ( cpu_has_vmx_virtual_intr_delivery && !is_pvh_vcpu(v) )
>      {
>          /* EOI-exit bitmap */
>          v->arch.hvm_vmx.eoi_exit_bitmap[0] = (uint64_t)0;
> @@ -958,7 +1071,7 @@ static int construct_vmcs(struct vcpu *v)
>          __vmwrite(GUEST_INTR_STATUS, 0);
>      }
>  
> -    if ( cpu_has_vmx_posted_intr_processing )
> +    if ( cpu_has_vmx_posted_intr_processing && !is_pvh_vcpu(v) )
>      {
>          __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
>          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);

These are likely to be meant as temporary changes only? In which
case they ought to be marked as such.

> @@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v)
>  
>      v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
>                | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
> +              | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U << TRAP_debug) : 0)
>                | (1U << TRAP_no_device);

What's so special about PVH that it requires these extra intercepts?

>      v->arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
> -    hvm_update_guest_cr(v, 0);
> +    if ( is_pvh_vcpu(v) )

I dislike your attitude of considering PVH the most important (thus
handled first) mode in general, but here it becomes most obvious:
The pre-existing code should really come first, and the new mode
should be handled in the "else" path. Same further down for the
CR4 handling.

> +    {
> +        u64 tmpval = v->arch.hvm_vcpu.guest_cr[0] | X86_CR0_PG | X86_CR0_NE;
> +        v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval;

So you set hw_cr[0] while I saw you saying hw_cr[4] won't be
touched by PVH in a reply to the v10 series. Such
inconsistencies are just calling for subsequent bugs. Either PVH
set all valid hw_cr[] fields, or it clearly states somewhere that
none of them are used (and then the assignment above ought to
go away).

> @@ -1297,6 +1440,9 @@ void vmx_do_resume(struct vcpu *v)
>          hvm_asid_flush_vcpu(v);
>      }
>  
> +    if ( is_pvh_vcpu(v) )
> +        reset_stack_and_jump(vmx_asm_do_vmentry);
> +
>      debug_state = v->domain->debugger_attached
>                    || 
> v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3]
>                    || 
> v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP];

Missing a "fixme" annotation?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.