[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.



Hi Jan,


On 09/02/17 16:22, Jan Beulich wrote:
On 08.02.17 at 16:32, <JBeulich@xxxxxxxx> wrote:
On 07.02.17 at 18:26, <anshul.makkar@xxxxxxxxxx> wrote:
Facing a issue where bootstorm of guests leads to host crash. I debugged
and found that that enabling PML  introduces a  race condition during
guest teardown stage while disabling PML on a vcpu  and context switch
happening for another vcpu.

Crash happens only on Broadwell processors as PML got introduced in this
generation.

Here is my analysis:

Race condition:

vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write(
disable_PML); vmx_vmcx_exit();)

In between I have a callpath from another pcpu executing context
switch-> vmx_fpu_leave() and crash on vmwrite..

    if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
{
           v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
           __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
       }
So that's after current has changed already, so it's effectively
dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
however, assumes that any user of a VMCS either owns the lock
or has current as the owner of the VMCS. Of course such a call
also can't be added here, as a vcpu on the context-switch-from
path can't vcpu_pause() itself.

That taken together with the bypassing of __context_switch()
when the incoming vCPU is the idle one (which means that via
context_saved() ->is_running will be cleared before running
->ctxt_switch_from()), the vcpu_pause() invocation in
vmx_vmcs_try_enter() may not have to wait at all if the call
happens between the clearing of ->is_running and the
eventual invocation of vmx_ctxt_switch_from().
Mind giving the attached patch a try (which admittedly was only
lightly tested so far - in particular I haven't seen the second of
the debug messages being logged, yet)?
Patch looks promising. I couldn't do much thorough testing, but initial reboot cycles (around 20 reboots of 32 VMS) went fine.

Jan

Anshul

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.