[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V5] x86/altp2m: Fix crash with INVALID_ALTP2M EPTP index



On 07/20/2018 06:07 PM, George Dunlap wrote:
> On 06/28/2018 03:35 PM, Razvan Cojocaru wrote:
>> A VM exit handler executed immediately after enabling #VE might
>> find a stale __vmsave()d EPTP_INDEX, stored by calling
>> altp2m_vcpu_destroy() when SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS
>> had been enabled by altp2m_vcpu_update_vmfunc_ve().
>>
>> vmx_vmexit_handler() __vmread()s EPTP_INDEX as soon as
>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS is set, so if an
>> application enables altp2m on a domain, succesfully calls
>> xc_altp2m_set_vcpu_enable_notify(), then disables altp2m and
>> exits, a second run of said application will likely read the
>> INVALID_ALTP2M EPTP_INDEX set when disabling altp2m in the first
>> run, and crash the host with the BUG_ON(idx >= MAX_ALTP2M),
>> between xc_altp2m_set_vcpu_enable_notify() and
>> xc_altp2m_set_domain_state(..., false).
>>
>> The problem is not restricted to an INVALID_ALTP2M EPTP_INDEX
>> (which can only sanely happen on altp2m uninit), but applies
>> to any stale index previously saved - which means that all
>> altp2m_vcpu_update_vmfunc_ve() calls must also call
>> altp2m_vcpu_update_p2m() after setting
>> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS, in order to make sure
>> that the stored EPTP_INDEX is always valid at
>> vmx_vmexit_handler() time.
> 
> I'm sorry, this description still doesn't make hardly any sense to me,
> nor the solution, even after reading all the previous threads on the
> issue.  The description doesn't, for instance, mention vcpu_pause() at
> all, in spite of the fact that it seems (from the previous discussion)
> that this is a critical part of why this solution works; nor is there
> any comment in the code about the required discipline regarding
> SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS,  making it fairly likely that
> someone will re-introduce a bug like this in the future.
> 
> My normal template for something like this is
> 1. Explain what the current situation is
> 2. Explain why that's a problem
> 3. Describe what you're changing and how it fixes it.
> 
> I can't help but think the right thing to do here is in vmx.c somewhere
> -- it is, after all, code in vmx.c that:
> 1. Sets and clears SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS
> 2. Writes EPTP_INDEX
> 3. Assumes that SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS => EPTP_INDEX is
> valid.
> 
> What about something like the attached, instead (compile-tested only)?
George, thanks for the review, comments and new patch! You're the third
person telling me that the patch description is hard to parse - I'll
definitely work on that skill in the future (and sorry for the
inconvenience).

The vcpu_pause() lead was a red herring in my initial investigation of
the issue, and that is the reason why it didn't make it into the patch
description. The pausing already done is fine.

I've tested your patch on my system (where I can reproduce the crash
with a 100% reproduction rate without it), and I've had no crashes - so
it does seem to have fixed the problem. Thinking about the crash path,
it also makes sense that it would fix the problem - I can't think of any
objections to it.

Let me try the explanation again:

The current situation: when we run twice an altp2m client application
which uses altp2m_vcpu_update_vmfunc_ve() (it _has_ to be twice), the
following happens: after the first run of the application,
altp2m_vcpu_destroy() gets called as part of the cleanup process, and
this stores INVALID_ALTP2M EPTP_INDEX in the VMCS.

altp2m_vcpu_destroy() is what saves INVALID_ALTP2M after the first run
of the client application:

 48 void
 49 altp2m_vcpu_destroy(struct vcpu *v)
 50 {
 51     struct p2m_domain *p2m;
 52
 53     if ( v != current )
 54         vcpu_pause(v);
 55
 56     if ( (p2m = p2m_get_altp2m(v)) )
 57         atomic_dec(&p2m->active_vcpus);
 58
 59     altp2m_vcpu_reset(v);
 60
 61     altp2m_vcpu_update_p2m(v);
 62     altp2m_vcpu_update_vmfunc_ve(v);
 63
 64     if ( v != current )
 65         vcpu_unpause(v);
 66 }

altp2m_vcpu_reset(v) sets av->p2midx = INVALID_ALTP2M, then
altp2m_vcpu_update_p2m(v) saves it.

The _second_ run of the application then calls
altp2m_vcpu_update_vmfunc_ve() again. At this point, EPTP_INDEX ==
INVALID_ALTP2M, and vmx_vcpu_update_vmfunc_ve() only sets
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS (but _not_ EPTP_INDEX also, as
your patch now does). The only function that updates EPTP_INDEX is
vmx_vcpu_update_eptp() - so altp2m_vcpu_update_p2m(v) in my patch.

The pausing is fine, but altp2m_vcpu_update_vmfunc_ve() did not save
EPTP_INDEX.

altp2m_vcpu_update_p2m(v) is called in only 4 places now:
p2m_switch_domain_altp2m_by_id(), p2m_switch_vcpu_altp2m_by_id(),
altp2m_vcpu_initialise() and altp2m_vcpu_destroy().

So at the time of the second run of the application, EPTP_INDEX is still
INVALID_ALTP2M, and vmx_vcpu_update_vmfunc_ve() does nothing about it.

At this point, the first call of vmx_vmexit_handler() will find
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS set and will try to work with the
stored EPTP_INDEX. So you see, the pausing is fine, but the flow is
rather unfortunate.

So basically my patch does what your patch also does in essence. I just
thought that I should change the code that's _not_ VMX-specific in case
altp2m is extended to SVM.

So I hope I've been able to finally clarify things - and if not, it
should be clear to everybody by now that I'm really trying but failing
to be eloquent on this particular topic. :)

Long story short, FWIW:

Reviewed-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
Tested-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

And, of course, many thanks for your help!


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.