Xen project Mailing List

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

On 27/10/17 18:42, Igor Druzhinin wrote: > On 16/02/17 11:15, Jan Beulich wrote: >> When __context_switch() is being bypassed during original context >> switch handling, the vCPU "owning" the VMCS partially loses control of >> it: It will appear non-running to remote CPUs, and hence their attempt >> to pause the owning vCPU will have no effect on it (as it already >> looks to be paused). At the same time the "owning" CPU will re-enable >> interrupts eventually (the lastest when entering the idle loop) and >> hence becomes subject to IPIs from other CPUs requesting access to the >> VMCS. As a result, when __context_switch() finally gets run, the CPU >> may no longer have the VMCS loaded, and hence any accesses to it would >> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from(). >> >> Similarly, when __context_switch() is being bypassed also on the second >> (switch-in) path, VMCS ownership may have been lost and hence needs >> re-establishing. Since there's no existing hook to put this in, add a >> new one. >> >> Reported-by: Kevin Mayer <Kevin.Mayer@xxxxxxxx> >> Reported-by: Anshul Makkar <anshul.makkar@xxxxxxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in >> vmx_do_resume() instead of open coding it there (requiring the >> ASSERT()s to be adjusted/dropped). Drop the new >> ->ctxt_switch_same() hook. >> >> --- a/xen/arch/x86/hvm/vmx/vmcs.c >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >> @@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v >> local_irq_restore(flags); >> } >> >> +void vmx_vmcs_reload(struct vcpu *v) >> +{ >> + /* >> + * As we may be running with interrupts disabled, we can't acquire >> + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled >> + * the VMCS can't be taken away from us anymore if we still own it. >> + */ >> + ASSERT(v->is_running || !local_irq_is_enabled()); >> + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) >> + return; >> + >> + vmx_load_vmcs(v); >> +} >> + >> int vmx_cpu_up_prepare(unsigned int cpu) >> { >> /* >> @@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v) >> bool_t debug_state; >> >> if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) >> - { >> - if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) >> - vmx_load_vmcs(v); >> - } >> + vmx_vmcs_reload(v); >> else >> { >> /* >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct >> if ( unlikely(!this_cpu(vmxon)) ) >> return; >> >> + if ( !v->is_running ) >> + { >> + /* >> + * When this vCPU isn't marked as running anymore, a remote pCPU's >> + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason >> + * to spin in vcpu_sleep_sync(), and hence that pCPU might have >> taken >> + * away the VMCS from us. As we're running with interrupts disabled, >> + * we also can't call vmx_vmcs_enter(). >> + */ >> + vmx_vmcs_reload(v); >> + } >> + >> vmx_fpu_leave(v); >> vmx_save_guest_msrs(v); >> vmx_restore_host_msrs(); >> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h >> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h >> @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v); >> void vmx_vmcs_enter(struct vcpu *v); >> bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); >> void vmx_vmcs_exit(struct vcpu *v); >> +void vmx_vmcs_reload(struct vcpu *v); >> >> #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 >> #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 >> > > Hi Jan, > > I'm not entirely sure if it's something related but the end result looks > similar to the issue that this patch solved. We are now getting reports of > a similar race condition with the following stack trace on 4.7.1 with this > patch backported but I'm pretty sure this should be the case for master > as well: > > (XEN) [480198.570165] Xen call trace: > (XEN) [480198.570168] [<ffff82d0801eb53e>] > vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0x136/0x1a8 > (XEN) [480198.570171] [<ffff82d080160095>] > domain.c#__context_switch+0x10c/0x3a4 > (XEN) [480198.570176] [<ffff82d08016560c>] __sync_local_execstate+0x35/0x51 > (XEN) [480198.570179] [<ffff82d08018bd82>] invalidate_interrupt+0x40/0x73 > (XEN) [480198.570183] [<ffff82d08016ea1f>] do_IRQ+0x8c/0x5cb > (XEN) [480198.570186] [<ffff82d08022d93f>] common_interrupt+0x5f/0x70 > (XEN) [480198.570189] [<ffff82d0801b32b0>] vpmu_destroy+0/0x100 > (XEN) [480198.570192] [<ffff82d0801e7dc9>] vmx.c#vmx_vcpu_destroy+0x21/0x30 > (XEN) [480198.570195] [<ffff82d0801c2bf6>] hvm_vcpu_destroy+0x70/0x77 > (XEN) [480198.570197] [<ffff82d08016101e>] vcpu_destroy+0x5d/0x72 > (XEN) [480198.570201] [<ffff82d080107510>] > domain.c#complete_domain_destroy+0x49/0x182 > (XEN) [480198.570204] [<ffff82d0801266d2>] > rcupdate.c#rcu_process_callbacks+0x141/0x1a3 > (XEN) [480198.570207] [<ffff82d080132f95>] softirq.c#__do_softirq+0x75/0x80 > (XEN) [480198.570209] [<ffff82d080132fae>] > process_pending_softirqs+0xe/0x10 > (XEN) [480198.570212] [<ffff82d0801b256f>] > mwait-idle.c#mwait_idle+0xf5/0x2c3 > (XEN) [480198.570214] [<ffff82d0801e0d00>] vmx_intr_assist+0x3bf/0x4f2 > (XEN) [480198.570216] [<ffff82d08015fd57>] domain.c#idle_loop+0x38/0x4d > > So far all the attempts to get a repro locally failed - the race is quite > rare - > it only happens when (probably) the issue with stolen VMCS appears AND TLB > flush > IPI comes at the moment of domain destruction. For the issue to appear several > conditions should be met: > 1) TLB flush IPI should execute __sync_local_execstate and enter VMX context > switch > 2) This should come at the VCPU destroy loop in RCU callback > 3) VMCS pointer should be invalid (possibly stolen or cleared somehow) > > My idea was to provoke the crash somehow by simulating the conditions > described > above. Using Andrew's suggestion I now can satisfy conditions 1 and 2 with > some help of XTF, but I still have no idea how to provoke 3. > > Any ideas about the root cause of the fault and suggestions how to reproduce > it > would be welcome. Does this crash really has something to do with PML? I doubt > because the original environment may hardly be called PML-heavy. > > Thanks, > Igor > So we finally have complete understanding of what's going on: Some vCPU has just migrated to another pCPU and we switched to idle but per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is how the current logic works. While we're in idle we're issuing vcpu_destroy() for some other domain which eventually calls vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At this moment we get a TLB flush IPI from that same vCPU which is now context switching on another pCPU - it appears to clean TLB after itself. This vCPU is already marked is_running=1 by the scheduler. In the IPI handler we enter __sync_local_execstate() and trying to call vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE crashes the hypervisor. So the state transition diagram might look like: pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks -> vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear() pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH! We can basically just fix the condition around vmcs_reload() call but I'm not completely sure that it's the right way to do - I don't think leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea (maybe we need to clean it). What are your thoughts? CC-ing Dario Thanks, Igor _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.