[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
>>> On 08.02.17 at 16:32, <JBeulich@xxxxxxxx> wrote: >>>> On 07.02.17 at 18:26, <anshul.makkar@xxxxxxxxxx> wrote: >> Facing a issue where bootstorm of guests leads to host crash. I debugged >> and found that that enabling PML introduces a race condition during >> guest teardown stage while disabling PML on a vcpu and context switch >> happening for another vcpu. >> >> Crash happens only on Broadwell processors as PML got introduced in this >> generation. >> >> Here is my analysis: >> >> Race condition: >> >> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( >> disable_PML); vmx_vmcx_exit();) >> >> In between I have a callpath from another pcpu executing context >> switch-> vmx_fpu_leave() and crash on vmwrite.. >> >> if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) >> { >> v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; >> __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash >> } > > So that's after current has changed already, so it's effectively > dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter(). > The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(), > however, assumes that any user of a VMCS either owns the lock > or has current as the owner of the VMCS. Of course such a call > also can't be added here, as a vcpu on the context-switch-from > path can't vcpu_pause() itself. > > That taken together with the bypassing of __context_switch() > when the incoming vCPU is the idle one (which means that via > context_saved() ->is_running will be cleared before running > ->ctxt_switch_from()), the vcpu_pause() invocation in > vmx_vmcs_try_enter() may not have to wait at all if the call > happens between the clearing of ->is_running and the > eventual invocation of vmx_ctxt_switch_from(). > > If the above makes sense (which I'm not sure at all), the > question is whether using this_cpu(curr_vcpu) instead of > current in the VMCS enter/exit functions would help. This won't help, as it won't make vcpu_pause() wait (which is the core of the problem). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |