[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



On 16/02/17 11:15, Jan Beulich wrote:
> When __context_switch() is being bypassed during original context
> switch handling, the vCPU "owning" the VMCS partially loses control of
> it: It will appear non-running to remote CPUs, and hence their attempt
> to pause the owning vCPU will have no effect on it (as it already
> looks to be paused). At the same time the "owning" CPU will re-enable
> interrupts eventually (the lastest when entering the idle loop) and
> hence becomes subject to IPIs from other CPUs requesting access to the
> VMCS. As a result, when __context_switch() finally gets run, the CPU
> may no longer have the VMCS loaded, and hence any accesses to it would
> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
> 
> Similarly, when __context_switch() is being bypassed also on the second
> (switch-in) path, VMCS ownership may have been lost and hence needs
> re-establishing. Since there's no existing hook to put this in, add a
> new one.
> 
> Reported-by: Kevin Mayer <Kevin.Mayer@xxxxxxxx>
> Reported-by: Anshul Makkar <anshul.makkar@xxxxxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
>     vmx_do_resume() instead of open coding it there (requiring the
>     ASSERT()s to be adjusted/dropped). Drop the new
>     ->ctxt_switch_same() hook.
> 
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
>      local_irq_restore(flags);
>  }
>  
> +void vmx_vmcs_reload(struct vcpu *v)
> +{
> +    /*
> +     * As we may be running with interrupts disabled, we can't acquire
> +     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
> +     * the VMCS can't be taken away from us anymore if we still own it.
> +     */
> +    ASSERT(v->is_running || !local_irq_is_enabled());
> +    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
> +        return;
> +
> +    vmx_load_vmcs(v);
> +}
> +
>  int vmx_cpu_up_prepare(unsigned int cpu)
>  {
>      /*
> @@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
>      bool_t debug_state;
>  
>      if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
> -    {
> -        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
> -            vmx_load_vmcs(v);
> -    }
> +        vmx_vmcs_reload(v);
>      else
>      {
>          /*
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
>      if ( unlikely(!this_cpu(vmxon)) )
>          return;
>  
> +    if ( !v->is_running )
> +    {
> +        /*
> +         * When this vCPU isn't marked as running anymore, a remote pCPU's
> +         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
> +         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
> +         * away the VMCS from us. As we're running with interrupts disabled,
> +         * we also can't call vmx_vmcs_enter().
> +         */
> +        vmx_vmcs_reload(v);
> +    }
> +
>      vmx_fpu_leave(v);
>      vmx_save_guest_msrs(v);
>      vmx_restore_host_msrs();
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
>  void vmx_vmcs_enter(struct vcpu *v);
>  bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
>  void vmx_vmcs_exit(struct vcpu *v);
> +void vmx_vmcs_reload(struct vcpu *v);
>  
>  #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
>  #define CPU_BASED_USE_TSC_OFFSETING           0x00000008
> 

Hi Jan,

I'm not entirely sure if it's something related but the end result looks
similar to the issue that this patch solved. We are now getting reports of
a similar race condition with the following stack trace on 4.7.1 with this
patch backported but I'm pretty sure this should be the case for master
as well:

(XEN) [480198.570165] Xen call trace:
(XEN) [480198.570168]    [<ffff82d0801eb53e>] 
vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0x136/0x1a8
(XEN) [480198.570171]    [<ffff82d080160095>] 
domain.c#__context_switch+0x10c/0x3a4
(XEN) [480198.570176]    [<ffff82d08016560c>] __sync_local_execstate+0x35/0x51
(XEN) [480198.570179]    [<ffff82d08018bd82>] invalidate_interrupt+0x40/0x73
(XEN) [480198.570183]    [<ffff82d08016ea1f>] do_IRQ+0x8c/0x5cb
(XEN) [480198.570186]    [<ffff82d08022d93f>] common_interrupt+0x5f/0x70
(XEN) [480198.570189]    [<ffff82d0801b32b0>] vpmu_destroy+0/0x100
(XEN) [480198.570192]    [<ffff82d0801e7dc9>] vmx.c#vmx_vcpu_destroy+0x21/0x30
(XEN) [480198.570195]    [<ffff82d0801c2bf6>] hvm_vcpu_destroy+0x70/0x77
(XEN) [480198.570197]    [<ffff82d08016101e>] vcpu_destroy+0x5d/0x72
(XEN) [480198.570201]    [<ffff82d080107510>] 
domain.c#complete_domain_destroy+0x49/0x182
(XEN) [480198.570204]    [<ffff82d0801266d2>] 
rcupdate.c#rcu_process_callbacks+0x141/0x1a3
(XEN) [480198.570207]    [<ffff82d080132f95>] softirq.c#__do_softirq+0x75/0x80
(XEN) [480198.570209]    [<ffff82d080132fae>] process_pending_softirqs+0xe/0x10
(XEN) [480198.570212]    [<ffff82d0801b256f>] mwait-idle.c#mwait_idle+0xf5/0x2c3
(XEN) [480198.570214]    [<ffff82d0801e0d00>] vmx_intr_assist+0x3bf/0x4f2
(XEN) [480198.570216]    [<ffff82d08015fd57>] domain.c#idle_loop+0x38/0x4d

So far all the attempts to get a repro locally failed - the race is quite rare -
it only happens when (probably) the issue with stolen VMCS appears AND TLB flush
IPI comes at the moment of domain destruction. For the issue to appear several
conditions should be met:
1) TLB flush IPI should execute __sync_local_execstate and enter VMX context
switch
2) This should come at the VCPU destroy loop in RCU callback
3) VMCS pointer should be invalid (possibly stolen or cleared somehow)

My idea was to provoke the crash somehow by simulating the conditions described
above. Using Andrew's suggestion I now can satisfy conditions 1 and 2 with
some help of XTF, but I still have no idea how to provoke 3.

Any ideas about the root cause of the fault and suggestions how to reproduce it
would be welcome. Does this crash really has something to do with PML? I doubt
because the original environment may hardly be called PML-heavy.

Thanks,
Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.