Xen project Mailing List

Re: [Xen-devel] [PATCH V5] x86/altp2m: Fix crash with INVALID_ALTP2M EPTP index

On 07/20/2018 06:07 PM, George Dunlap wrote: > On 06/28/2018 03:35 PM, Razvan Cojocaru wrote: >> A VM exit handler executed immediately after enabling #VE might >> find a stale __vmsave()d EPTP_INDEX, stored by calling >> altp2m_vcpu_destroy() when SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS >> had been enabled by altp2m_vcpu_update_vmfunc_ve(). >> >> vmx_vmexit_handler() __vmread()s EPTP_INDEX as soon as >> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, so if an >> application enables altp2m on a domain, succesfully calls >> xc_altp2m_set_vcpu_enable_notify(), then disables altp2m and >> exits, a second run of said application will likely read the >> INVALID_ALTP2M EPTP_INDEX set when disabling altp2m in the first >> run, and crash the host with the BUG_ON(idx >= MAX_ALTP2M), >> between xc_altp2m_set_vcpu_enable_notify() and >> xc_altp2m_set_domain_state(..., false). >> >> The problem is not restricted to an INVALID_ALTP2M EPTP_INDEX >> (which can only sanely happen on altp2m uninit), but applies >> to any stale index previously saved - which means that all >> altp2m_vcpu_update_vmfunc_ve() calls must also call >> altp2m_vcpu_update_p2m() after setting >> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS, in order to make sure >> that the stored EPTP_INDEX is always valid at >> vmx_vmexit_handler() time. > > I'm sorry, this description still doesn't make hardly any sense to me, > nor the solution, even after reading all the previous threads on the > issue. The description doesn't, for instance, mention vcpu_pause() at > all, in spite of the fact that it seems (from the previous discussion) > that this is a critical part of why this solution works; nor is there > any comment in the code about the required discipline regarding > SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS, making it fairly likely that > someone will re-introduce a bug like this in the future. > > My normal template for something like this is > 1. Explain what the current situation is > 2. Explain why that's a problem > 3. Describe what you're changing and how it fixes it. > > I can't help but think the right thing to do here is in vmx.c somewhere > -- it is, after all, code in vmx.c that: > 1. Sets and clears SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS > 2. Writes EPTP_INDEX > 3. Assumes that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS => EPTP_INDEX is > valid. > > What about something like the attached, instead (compile-tested only)? George, thanks for the review, comments and new patch! You're the third person telling me that the patch description is hard to parse - I'll definitely work on that skill in the future (and sorry for the inconvenience). The vcpu_pause() lead was a red herring in my initial investigation of the issue, and that is the reason why it didn't make it into the patch description. The pausing already done is fine. I've tested your patch on my system (where I can reproduce the crash with a 100% reproduction rate without it), and I've had no crashes - so it does seem to have fixed the problem. Thinking about the crash path, it also makes sense that it would fix the problem - I can't think of any objections to it. Let me try the explanation again: The current situation: when we run twice an altp2m client application which uses altp2m_vcpu_update_vmfunc_ve() (it _has_ to be twice), the following happens: after the first run of the application, altp2m_vcpu_destroy() gets called as part of the cleanup process, and this stores INVALID_ALTP2M EPTP_INDEX in the VMCS. altp2m_vcpu_destroy() is what saves INVALID_ALTP2M after the first run of the client application: 48 void 49 altp2m_vcpu_destroy(struct vcpu *v) 50 { 51 struct p2m_domain *p2m; 52 53 if ( v != current ) 54 vcpu_pause(v); 55 56 if ( (p2m = p2m_get_altp2m(v)) ) 57 atomic_dec(&p2m->active_vcpus); 58 59 altp2m_vcpu_reset(v); 60 61 altp2m_vcpu_update_p2m(v); 62 altp2m_vcpu_update_vmfunc_ve(v); 63 64 if ( v != current ) 65 vcpu_unpause(v); 66 } altp2m_vcpu_reset(v) sets av->p2midx = INVALID_ALTP2M, then altp2m_vcpu_update_p2m(v) saves it. The _second_ run of the application then calls altp2m_vcpu_update_vmfunc_ve() again. At this point, EPTP_INDEX == INVALID_ALTP2M, and vmx_vcpu_update_vmfunc_ve() only sets SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS (but _not_ EPTP_INDEX also, as your patch now does). The only function that updates EPTP_INDEX is vmx_vcpu_update_eptp() - so altp2m_vcpu_update_p2m(v) in my patch. The pausing is fine, but altp2m_vcpu_update_vmfunc_ve() did not save EPTP_INDEX. altp2m_vcpu_update_p2m(v) is called in only 4 places now: p2m_switch_domain_altp2m_by_id(), p2m_switch_vcpu_altp2m_by_id(), altp2m_vcpu_initialise() and altp2m_vcpu_destroy(). So at the time of the second run of the application, EPTP_INDEX is still INVALID_ALTP2M, and vmx_vcpu_update_vmfunc_ve() does nothing about it. At this point, the first call of vmx_vmexit_handler() will find SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS set and will try to work with the stored EPTP_INDEX. So you see, the pausing is fine, but the flow is rather unfortunate. So basically my patch does what your patch also does in essence. I just thought that I should change the code that's _not_ VMX-specific in case altp2m is extended to SVM. So I hope I've been able to finally clarify things - and if not, it should be clear to everybody by now that I'm really trying but failing to be eloquent on this particular topic. :) Long story short, FWIW: Reviewed-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> Tested-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> And, of course, many thanks for your help! Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.