[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V2] x86/altp2m: Fixed crash with INVALID_ALTP2M EPTP index



On 06/26/2018 03:06 PM, Razvan Cojocaru wrote:
> On 06/26/2018 02:56 PM, Jan Beulich wrote:
>>>>> On 26.06.18 at 12:55, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> On 06/26/2018 01:26 PM, Jan Beulich wrote:
>>>>>>> On 25.06.18 at 16:10, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>>> --- a/xen/arch/x86/mm/altp2m.c
>>>>> +++ b/xen/arch/x86/mm/altp2m.c
>>>>> @@ -58,8 +58,8 @@ altp2m_vcpu_destroy(struct vcpu *v)
>>>>>  
>>>>>      altp2m_vcpu_reset(v);
>>>>>  
>>>>> -    altp2m_vcpu_update_p2m(v);
>>>>>      altp2m_vcpu_update_vmfunc_ve(v);
>>>>> +    altp2m_vcpu_update_p2m(v);
>>>>
>>>> I agree this addresses this specific incarnation of the problem. However,
>>>> if the vCPU indeed runs while being manipulated, I don't think you get
>>>> rid of the race this way. For one, there is e.g. a solitary call to
>>>> altp2m_vcpu_update_vmfunc_ve() in the handling of
>>>> HVMOP_altp2m_vcpu_enable_notify. That'll lead to
>>>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS getting set, but
>>>> EPTP_INDEX won't be written. Whatever previous value is in place
>>>> would then be read back in VM exit handling.
>>>>
>>>> With that it also looks to me as if the two step (and hence non-atomic
>>>> from the perspective of the guest) update is a problem. Even with the
>>>> change above, the VM exit may now happen exactly between the two
>>>> function calls.
>>>>
>>>> It seems to me that pausing the vCPU is almost unavoidable (and then
>>>> the ordering of the two calls is relevant only because
>>>> vmx_vcpu_update_eptp() would better respect the intended new
>>>> setting of SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS).
>>>
>>> I see. Would not then your scenario be covered by
>>> altp2m_vcpu_update_vmfunc_ve() always calling altp2m_vcpu_update_p2m()
>>> at the very end (and removing duplicate calls to altp2m_vcpu_update_p2m())?
>>
>> No - there would then still be a window of time where the bit is set but
>> the index was not (yet) written.
>>
>>> Unless something is very wrong, all calls to
>>> altp2m_vcpu_update_vmfunc_ve() _should_ happen within a pause, so no
>>> calls to the VM exit handler should occur between them if they become a
>>> single block of logic within altp2m_vcpu_update_vmfunc_ve().
>>
>> Did I misunderstand your analysis mails then? It looked to me as if
>> you were observing exactly such races, because of the vCPU not
>> being paused.
> 
> No, you've understood my initial email perfectly, but I had
> unfortunately misread the log. I've attempted to clarify things with the
> follow-up (which clearly has not been eloquent) (the "apologies, I seem
> to have misread the crash timing" email).
> 
> I've misread my log because altp2m_vcpu_update_vmfunc_ve() was called
> the first time on handling HVMOP_altp2m_vcpu_enable_notify, and a second
> time on altp2m_vcpu_destroy() - and looking at it I've first thought
> that the call leading to the crash happened in altp2m_vcpu_destroy()
> (which would have indeed implied a nasty race with the VM exit handler).
> 
> But then I've looked closer and found that the crash was happening soon
> after the HVMOP_altp2m_vcpu_enable_notify handling, in the second run of
> the test application (the first run was writing INVALID_ALTP2M in
> EPTP_INDEX on altp2m_vcpu_destroy()).
> 
> The VCPU pausing seem to work as expected. Sorry for the misunderstanding.

So assuming that the HVMOPs are properly handled atomically with respect
to their corresponding VCPU (which I now believe to be the case), the
only possible issue that remains (that I can think of) is the case where
EPTP_INDEX has been saved in a previous run of a test application, but
is not correct for the current state of the guest.

Then HVMOP_altp2m_vcpu_enable_notify is handled,
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, and the VM exit handler is
called, which would result in working with the wrong index. This, I
believe, can be fixed by either calling altp2m_vcpu_update_p2m() all the
time immediately after all altp2m_vcpu_update_vmfunc_ve() calls, or
making it a part of altp2m_vcpu_update_vmfunc_ve() (hence my previous
proposal).

Does that sound reasonable?


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.