Xen project Mailing List

Re: [Xen-devel] [V11 PATCH 17/21] PVH xen: vmcs related changes

To: "Mukesh Rathor" <mukesh.rathor@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 23 Aug 2013 09:41:55 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 23 Aug 2013 08:42:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 23.08.13 at 03:19, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> wrote: > This patch contains vmcs changes related for PVH, mainly creating a VMCS > for PVH guest. > > Changes in V11: > - Remove pvh_construct_vmcs and make it part of construct_vmcs > > Signed-off-by: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> > Acked-by: Keir Fraser <keir@xxxxxxx> > Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > PV-HVM-Regression-Tested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Same comment as for #9 - too many changes to retain the tags. > @@ -874,16 +935,37 @@ static int construct_vmcs(struct vcpu *v) > if ( d->arch.vtsc ) > v->arch.hvm_vmx.exec_control |= CPU_BASED_RDTSC_EXITING; > > - v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control; > + if ( is_pvh_vcpu(v) ) > + { > + /* Phase I PVH: we run with minimal secondary exec features */ > + u32 bits = SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID; > > - /* Disable VPID for now: we decide when to enable it on VMENTER. */ > - v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; > + /* Intel SDM: resvd bits are 0 */ > + v->arch.hvm_vmx.secondary_exec_control = bits; > + } > + else > + { > + v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control; > + > + /* Disable VPID for now: we decide when to enable it on VMENTER. */ > + v->arch.hvm_vmx.secondary_exec_control &= > ~SECONDARY_EXEC_ENABLE_VPID; > + } So this difference in VPID handling needs some explanation, as I don't immediately see how you adjust the code referred to by the non-PVH comment above. > if ( paging_mode_hap(d) ) > { > v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING | > CPU_BASED_CR3_LOAD_EXITING | > CPU_BASED_CR3_STORE_EXITING); > + if ( is_pvh_vcpu(v) ) > + { > + u32 bits = CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | > + CPU_BASED_ACTIVATE_MSR_BITMAP; > + v->arch.hvm_vmx.exec_control |= bits; > + > + bits = CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_TPR_SHADOW | > + CPU_BASED_VIRTUAL_NMI_PENDING; > + v->arch.hvm_vmx.exec_control &= ~bits; > + } Did you notice that the original adjustments were truly HAP-related? Putting your PVH code here just because it takes HAP as a prereq is not the way to go. Wherever the flags you want set get set for the HVM case, you should add your adjustments (if any are necessary at all). > @@ -905,9 +987,22 @@ static int construct_vmcs(struct vcpu *v) > > vmx_update_cpu_exec_control(v); > __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl); > + > + if ( is_pvh_vcpu(v) ) > + { > + /* > + * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which > means > + * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and > + * not use the host values. 0 would cause it to not use the VMCS > values. > + */ > + vmentry_ctl &= ~(VM_ENTRY_LOAD_GUEST_EFER | VM_ENTRY_SMM | > + VM_ENTRY_DEACT_DUAL_MONITOR); > + /* PVH 32bitfixme. */ > + vmentry_ctl |= VM_ENTRY_IA32E_MODE; /* GUEST_EFER.LME/LMA ignored > */ > + } > __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl); This is misplaced too - we're past the point of determining the set of flags, and already in the process of committing them. The code again should go alongside where the corresponding HVM code sits. > - if ( cpu_has_vmx_ple ) > + if ( cpu_has_vmx_ple && !is_pvh_vcpu(v) ) > { > __vmwrite(PLE_GAP, ple_gap); > __vmwrite(PLE_WINDOW, ple_window); Why would this be conditional upon !PVH? > @@ -921,28 +1016,46 @@ static int construct_vmcs(struct vcpu *v) > if ( cpu_has_vmx_msr_bitmap ) > { > unsigned long *msr_bitmap = alloc_xenheap_page(); > + int msr_type = MSR_TYPE_R | MSR_TYPE_W; > > if ( msr_bitmap == NULL ) > + { > + vmx_vmcs_exit(v); > return -ENOMEM; > + } This is a genuine bug fix, which should be contributed separately (so it becomes backportable). > - vmx_disable_intercept_for_msr(v, MSR_FS_BASE, MSR_TYPE_R | > MSR_TYPE_W); > - vmx_disable_intercept_for_msr(v, MSR_GS_BASE, MSR_TYPE_R | > MSR_TYPE_W); > - vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, MSR_TYPE_R | > MSR_TYPE_W); > - vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, MSR_TYPE_R | > MSR_TYPE_W); > - vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, MSR_TYPE_R | > MSR_TYPE_W); > + /* Disable interecepts for MSRs that have corresponding VMCS fields. > */ > + vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type); > + vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type); > + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type); > + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type); > + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type); > + > if ( cpu_has_vmx_pat && paging_mode_hap(d) ) > - vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, MSR_TYPE_R | > MSR_TYPE_W); > + vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type); These changes all look pointless; if you really think they're worthwhile cleanup, they don't belong here. > - if ( cpu_has_vmx_virtual_intr_delivery ) > + if ( cpu_has_vmx_virtual_intr_delivery && !is_pvh_vcpu(v) ) > { > /* EOI-exit bitmap */ > v->arch.hvm_vmx.eoi_exit_bitmap[0] = (uint64_t)0; > @@ -958,7 +1071,7 @@ static int construct_vmcs(struct vcpu *v) > __vmwrite(GUEST_INTR_STATUS, 0); > } > > - if ( cpu_has_vmx_posted_intr_processing ) > + if ( cpu_has_vmx_posted_intr_processing && !is_pvh_vcpu(v) ) > { > __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc)); > __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); These are likely to be meant as temporary changes only? In which case they ought to be marked as such. > @@ -1032,16 +1150,36 @@ static int construct_vmcs(struct vcpu *v) > > v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK > | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault)) > + | (is_pvh_vcpu(v) ? (1U << TRAP_int3) | (1U << TRAP_debug) : 0) > | (1U << TRAP_no_device); What's so special about PVH that it requires these extra intercepts? > v->arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET; > - hvm_update_guest_cr(v, 0); > + if ( is_pvh_vcpu(v) ) I dislike your attitude of considering PVH the most important (thus handled first) mode in general, but here it becomes most obvious: The pre-existing code should really come first, and the new mode should be handled in the "else" path. Same further down for the CR4 handling. > + { > + u64 tmpval = v->arch.hvm_vcpu.guest_cr[0] | X86_CR0_PG | X86_CR0_NE; > + v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval; So you set hw_cr[0] while I saw you saying hw_cr[4] won't be touched by PVH in a reply to the v10 series. Such inconsistencies are just calling for subsequent bugs. Either PVH set all valid hw_cr[] fields, or it clearly states somewhere that none of them are used (and then the assignment above ought to go away). > @@ -1297,6 +1440,9 @@ void vmx_do_resume(struct vcpu *v) > hvm_asid_flush_vcpu(v); > } > > + if ( is_pvh_vcpu(v) ) > + reset_stack_and_jump(vmx_asm_do_vmentry); > + > debug_state = v->domain->debugger_attached > || > v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3] > || > v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP]; Missing a "fixme" annotation? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.