[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
On Thu, 2017-11-09 at 10:36 +0000, Sergey Dyasli wrote: > On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote: > > > > > On 09.11.17 at 10:54, <raistlin@xxxxxxxx> wrote: > > > > > > On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote: > > > > Perhaps I should improve my diagram: > > > > > > > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle > > > > context > > > > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) -> > > > > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at > > > > this > > > > point on pCPU1) > > > > > > > > pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB > > > > flush > > > > from context switch to clean TLB on pCPU1 > > > > > > > > > > Sorry, there must be something I'm missing (or misunderstanding). > > > > > > What is this code that checks is_running and triggers the TLB > > > flush? > > > > I don't see where Igor said is_running is being checked around a > > TLB flush. The TLB flush itself is what happens first thing in > > context_switch() (and it's really using the TLB flush interface to > > mainly effect the state flush, with the TLB flush being an implied > > side effect; I've already got a series of further patches to make > > this less implicit). > > > > > But, more important, how come you are context switching to > > > something > > > that has is_running == 1 ? That should not be possible. > > > > That's not what Igor's diagram says - it's indicating the fact that > > is_running is being set to 1 in the process of context switching > > into vCPUx. > > Jan, Dario, > Hi, > Igor was referring to the following situation: > > > pCPU1 pCPU2 > ===== ===== > current == vCPU1 > context_switch(next == idle) > !! __context_switch() is skipped > vcpu_migrate(vCPU1) > RCU callbacks > vmx_vcpu_destroy() > vmx_vcpu_disable_pml() > current_vmcs = 0 > > schedule(next == vCPU1) > vCPU1->is_running = 1; > context_switch(next == vCPU1) > flush_tlb_mask(&dirty_mask); > > <--- IPI > > __sync_local_execstate() > __context_switch(prev == vCPU1) > vmx_ctxt_switch_from(vCPU1) > vCPU1->is_running == 1 > !! vmx_vmcs_reload() is skipped > > I hope that this better illustrates the root cause. > Yes, I think this is clearer, and easier to understand. But that's probably a mater of habit and personal taste, so sorry again for misunderstanding it in its other form. Anyway, as I was trying to explain replaying to Jan, although in this situation the issue manifests as a consequence of vCPU migration, I think it is indeed more general, as in, without even the need to consider a second pCPU: pCPU1 ===== current == vCPU1 context_switch(next == idle) !! __context_switch() is skipped vcpu_migrate(vCPU1) anything_that_uses_or_touches_context() So, it must be anything_that_uses_or_touches_context() --knowing it will be reading or touching the pCPU context-- that syncs the state, before using or touching it (which is, e.g., what Jan's patch does). The only other solution I see, is to get rid of lazy context switch. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |