[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] VMX: sync CPU state upon vCPU destruction



>>> On 10.11.17 at 09:41, <sergey.dyasli@xxxxxxxxxx> wrote:
> On Thu, 2017-11-09 at 07:49 -0700, Jan Beulich wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -479,7 +479,13 @@ static void vmx_vcpu_destroy(struct vcpu
>>       * we should disable PML manually here. Note that vmx_vcpu_destroy is 
>> called
>>       * prior to vmx_domain_destroy so we need to disable PML for each vcpu
>>       * separately here.
>> +     *
>> +     * Before doing that though, flush all state for the vCPU previously 
>> having
>> +     * run on the current CPU, so that this flushing of state won't happen 
>> from
>> +     * the TLB flush IPI handler behind the back of a vmx_vmcs_enter() /
>> +     * vmx_vmcs_exit() section.
>>       */
>> +    sync_local_execstate();
>>      vmx_vcpu_disable_pml(v);
>>      vmx_destroy_vmcs(v);
>>      passive_domain_destroy(v);
> 
> This patch fixes only one particular issue and not the general problem.
> What if vmcs is cleared, possibly in some future code, at another place?

As indicated in the earlier discussion, if we go this route, other
future async accesses may need to do the same then.

> The original intent of vmx_vmcs_reload() is correct: it lazily loads
> the vmcs when it's needed. It's just the logic which checks for
> v->is_running inside vmx_ctxt_switch_from() is flawed: v might be
> "running" on another pCPU.
> 
> IMHO there are 2 possible solutions:
> 
>     1. Add additional pCPU check into vmx_ctxt_switch_from()

I agree with Dario in not seeing this as a possible solution.

>    2. Drop v->is_running check inside vmx_ctxt_switch_from() making
>        vmx_vmcs_reload() unconditional.

This is an option, indeed (and I don't think it would have a
meaningful performance impact, as vmx_vmcs_reload() does
nothing if the right VMCS is already in place). Iirc I had added the
conditional back then merely to introduce as little of a behavioral
change as was (appeared to be at that time) necessary. What I'm
not certain about, however, is the final state we'll end up in then.
Coming back to your flow scheme (altered to represent the
suggested new flow):

pCPU1                                   pCPU2
=====                                   =====
current == vCPU1
context_switch(next == idle)
!! __context_switch() is skipped
vcpu_migrate(vCPU1)
RCU callbacks
vmx_vcpu_destroy()
vmx_vcpu_disable_pml()
current_vmcs = 0

                                        schedule(next == vCPU1)
                                        vCPU1->is_running = 1;
                                        context_switch(next == vCPU1)
                                        flush_tlb_mask(&dirty_mask);
                                    
                                <--- IPI

__sync_local_execstate()
__context_switch(prev == vCPU1)
vmx_ctxt_switch_from(vCPU1)
vmx_vmcs_reload()
...

We'd now leave the being destroyed vCPU's VMCS active in pCPU1
(at least I can't see where it would be deactivated again).

Overall I think it is quite reasonable to terminate early a lazy
context switch of a vCPU under destruction. From that abstract
consideration, forcing this higher up the call stack of
vmx_vcpu_destroy() (as I had suggested as an alternative
previously, before actually moving it further down into VMX code,
perhaps even right in RCU handling) would continue to be an
option. In this context you may want to pay particular
attention to the description of 346da00456 ("Synchronise lazy
execstate before calling tasklet handlers").

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.