Xen project Mailing List

Re: [Xen-devel] [PATCH V2] x86/altp2m: Fixed crash with INVALID_ALTP2M EPTP index

From: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

Date: Tue, 26 Jun 2018 15:49:37 +0300

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Tue, 26 Jun 2018 12:49:55 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06/26/2018 03:06 PM, Razvan Cojocaru wrote: > On 06/26/2018 02:56 PM, Jan Beulich wrote: >>>>> On 26.06.18 at 12:55, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>> On 06/26/2018 01:26 PM, Jan Beulich wrote: >>>>>>> On 25.06.18 at 16:10, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>>>> --- a/xen/arch/x86/mm/altp2m.c >>>>> +++ b/xen/arch/x86/mm/altp2m.c >>>>> @@ -58,8 +58,8 @@ altp2m_vcpu_destroy(struct vcpu *v) >>>>> >>>>> altp2m_vcpu_reset(v); >>>>> >>>>> - altp2m_vcpu_update_p2m(v); >>>>> altp2m_vcpu_update_vmfunc_ve(v); >>>>> + altp2m_vcpu_update_p2m(v); >>>> >>>> I agree this addresses this specific incarnation of the problem. However, >>>> if the vCPU indeed runs while being manipulated, I don't think you get >>>> rid of the race this way. For one, there is e.g. a solitary call to >>>> altp2m_vcpu_update_vmfunc_ve() in the handling of >>>> HVMOP_altp2m_vcpu_enable_notify. That'll lead to >>>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS getting set, but >>>> EPTP_INDEX won't be written. Whatever previous value is in place >>>> would then be read back in VM exit handling. >>>> >>>> With that it also looks to me as if the two step (and hence non-atomic >>>> from the perspective of the guest) update is a problem. Even with the >>>> change above, the VM exit may now happen exactly between the two >>>> function calls. >>>> >>>> It seems to me that pausing the vCPU is almost unavoidable (and then >>>> the ordering of the two calls is relevant only because >>>> vmx_vcpu_update_eptp() would better respect the intended new >>>> setting of SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS). >>> >>> I see. Would not then your scenario be covered by >>> altp2m_vcpu_update_vmfunc_ve() always calling altp2m_vcpu_update_p2m() >>> at the very end (and removing duplicate calls to altp2m_vcpu_update_p2m())? >> >> No - there would then still be a window of time where the bit is set but >> the index was not (yet) written. >> >>> Unless something is very wrong, all calls to >>> altp2m_vcpu_update_vmfunc_ve() _should_ happen within a pause, so no >>> calls to the VM exit handler should occur between them if they become a >>> single block of logic within altp2m_vcpu_update_vmfunc_ve(). >> >> Did I misunderstand your analysis mails then? It looked to me as if >> you were observing exactly such races, because of the vCPU not >> being paused. > > No, you've understood my initial email perfectly, but I had > unfortunately misread the log. I've attempted to clarify things with the > follow-up (which clearly has not been eloquent) (the "apologies, I seem > to have misread the crash timing" email). > > I've misread my log because altp2m_vcpu_update_vmfunc_ve() was called > the first time on handling HVMOP_altp2m_vcpu_enable_notify, and a second > time on altp2m_vcpu_destroy() - and looking at it I've first thought > that the call leading to the crash happened in altp2m_vcpu_destroy() > (which would have indeed implied a nasty race with the VM exit handler). > > But then I've looked closer and found that the crash was happening soon > after the HVMOP_altp2m_vcpu_enable_notify handling, in the second run of > the test application (the first run was writing INVALID_ALTP2M in > EPTP_INDEX on altp2m_vcpu_destroy()). > > The VCPU pausing seem to work as expected. Sorry for the misunderstanding. So assuming that the HVMOPs are properly handled atomically with respect to their corresponding VCPU (which I now believe to be the case), the only possible issue that remains (that I can think of) is the case where EPTP_INDEX has been saved in a previous run of a test application, but is not correct for the current state of the guest. Then HVMOP_altp2m_vcpu_enable_notify is handled, SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, and the VM exit handler is called, which would result in working with the wrong index. This, I believe, can be fixed by either calling altp2m_vcpu_update_p2m() all the time immediately after all altp2m_vcpu_update_vmfunc_ve() calls, or making it a part of altp2m_vcpu_update_vmfunc_ve() (hence my previous proposal). Does that sound reasonable? Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.