[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



>>> On 07.11.17 at 16:52, <igor.druzhinin@xxxxxxxxxx> wrote:
> On 07/11/17 14:55, Jan Beulich wrote:
>>>>> On 07.11.17 at 15:24, <igor.druzhinin@xxxxxxxxxx> wrote:
>>> On 07/11/17 08:07, Jan Beulich wrote:
>>>> --- unstable.orig/xen/arch/x86/domain.c
>>>> +++ unstable/xen/arch/x86/domain.c
>>>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>>>  
>>>>  void vcpu_destroy(struct vcpu *v)
>>>>  {
>>>> +    /*
>>>> +     * Flush all state for this vCPU before fully tearing it down. This is
>>>> +     * particularly important for HVM ones on VMX, so that this flushing 
>>>> of
>>>> +     * state won't happen from the TLB flush IPI handler behind the back 
>>>> of
>>>> +     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>>>> +     */
>>>> +    sync_vcpu_execstate(v);
>>>> +
>>>>      xfree(v->arch.vm_event);
>>>>      v->arch.vm_event = NULL;
>>>
>>> I don't think this is going to fix the problem since vCPU we are
>>> currently destroying has nothing to do with the vCPUx that actually
>>> caused the problem by its migration. We still are going to call
>>> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
>> 
>> Oh, right, wrong vCPU. This should be better:
>> 
>> --- unstable.orig/xen/arch/x86/domain.c
>> +++ unstable/xen/arch/x86/domain.c
>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>  
>>  void vcpu_destroy(struct vcpu *v)
>>  {
>> +    /*
>> +     * Flush all state for the vCPU previously having run on the current 
>> CPU.
>> +     * This is in particular relevant for HVM ones on VMX, so that this
>> +     * flushing of state won't happen from the TLB flush IPI handler behind
>> +     * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>> +     */
>> +    sync_local_execstate();
>> +
>>      xfree(v->arch.vm_event);
>>      v->arch.vm_event = NULL;
>>  
>> In that case the question then is whether (rather than generalizing
>> is, as mentioned for the earlier version) this wouldn't better go into
>> vmx_vcpu_destroy(), assuming anything called earlier from
>> hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't
>> play with VMCSes).
> 
> Ah, ok. Does this also apply to the previous issue? May I revert that
> change to test it?

Feel free to try it, but I had checked that previous patch earlier
today, and right now I don't think the two issues are related.

> There is one things that I'm worrying about with this approach:
> 
> At this place we just sync the idle context because we know that we are
> going to deal with VMCS later. But what about other potential cases
> (perhaps some softirqs) in which we are accessing a vCPU data structure
> that is currently shared between different pCPUs. Maybe we'd better sync
> the context as soon as possible after we switched to idle from a
> migrated vCPU.

Well, yes, I had pointed out in the earlier reply that this is just to
deal with the specific case here. Whether to sync earlier after a
migration I'm not really sure about - the way it's written right now
is meant to deal with migration across CPUs. If so, this would
perhaps belong into scheduler code (and hence cover ARM as
well), and till now I wasn't able to figure a good place where to
put this.

George, Dario, do you have any thoughts both on the general
idea as well as where to put the necessary code?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.