[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/hap: be more selective with assisted TLB flush
On 30/04/2020 09:33, Jan Beulich wrote: > On 30.04.2020 10:28, Roger Pau Monné wrote: >> On Thu, Apr 30, 2020 at 09:20:58AM +0200, Jan Beulich wrote: >>> On 29.04.2020 19:36, Roger Pau Monne wrote: >>>> When doing an assisted flush on HAP the purpose of the >>>> on_selected_cpus is just to trigger a vmexit on remote CPUs that are >>>> in guest context, and hence just using is_vcpu_dirty_cpu is too lax, >>>> also check that the vCPU is running. >>> Am I right to understand that the change is relevant only to >>> cover the period of time between ->is_running becoming false >>> and ->dirty_cpu becoming VCPU_CPU_CLEAN? I.e. ... >>> >>>> --- a/xen/arch/x86/mm/hap/hap.c >>>> +++ b/xen/arch/x86/mm/hap/hap.c >>>> @@ -719,7 +719,7 @@ static bool flush_tlb(bool (*flush_vcpu)(void *ctxt, >>>> struct vcpu *v), >>>> hvm_asid_flush_vcpu(v); >>>> >>>> cpu = read_atomic(&v->dirty_cpu); >>>> - if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) ) >>>> + if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) && v->is_running ) >>> ... the previous logic would have suitably covered the switch-to >>> path, but doesn't properly cover the switch-from one, due to our >>> lazy context switch approach? >> Yes. Also __context_switch is not called from context_switch when >> switching to the idle vcpu, and hence dirty_cpu is not cleared. >> >>> If so, I agree with the change: >>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> >>> It might be worth mentioning this detail in the description then, >>> though. >> Would you mind adding to the commit message if you agree: >> >> "Due to the lazy context switching done by Xen dirty_cpu won't always be >> cleared when the guest vCPU is not running, and hence relying on >> is_running allows more fine grained control of whether the vCPU is >> actually running." > Sure; I'll give it over the weekend though for others to comment, if > so desired. I think it is worth pointing out that this fixes a large perf regression on Nehalem/Westmere systems, where L1 Shim using the enlightened hypercall is 8x slower than unenlightened way. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |