[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/vmx: Revert "x86/VMX: sanitize rIP before re-entering guest"



On 19.10.2020 18:12, Andrew Cooper wrote:
> On 19/10/2020 10:09, Jan Beulich wrote:
>> On 16.10.2020 17:38, Andrew Cooper wrote:
>>> On 15/10/2020 09:01, Jan Beulich wrote:
>>>> On 14.10.2020 15:57, Andrew Cooper wrote:
>>>>> Running with corrupt state is every bit an XSA as hitting a VMEntry
>>>>> failure if it can be triggered by userspace, but the latter safer and
>>>>> much more obvious.
>>>> I disagree. For CPL > 0 we don't "corrupt" guest state any more
>>>> than reporting a #GP fault when one is going to be reported
>>>> anyway (as long as the VM entry doesn't fail, and hence the
>>>> guest won't get crashed). IOW this raising of #GP actually is a
>>>> precautionary measure to _avoid_ XSAs.
>>> It does not remove any XSAs.  It merely hides them.
>> How that? If we convert the ability of guest user mode to crash
>> the guest into deliver of #GP(0), how is there a hidden XSA then?
> 
> Because userspace being able to triggering this fixup is still an XSA.

How do you know without a specific case at hand? It may well be
that all that's impacted is guest user space, in which case I
don't see why there would need to be an XSA. It's still a bug
then, sure.

>>>>> It was the appropriate security fix (give or take the functional bug in
>>>>> it) at the time, given the complexity of retrofitting zero length
>>>>> instruction fetches to the emulator.
>>>>>
>>>>> However, it is one of a very long list of guest-state-induced VMEntry
>>>>> failures, with non-trivial logic which we assert will pass, on a
>>>>> fastpath, where hardware also performs the same checks and we already
>>>>> have a runtime safe way of dealing with errors.  (Hence not actually
>>>>> using ASSERT_UNREACHABLE() here.)
>>>> "Runtime safe" as far as Xen is concerned, I take it. This isn't safe
>>>> for the guest at all, as vmx_failed_vmentry() results in an
>>>> unconditional domain_crash().
>>> Any VMEntry failure is a bug in Xen.  If userspace can trigger it, it is
>>> an XSA, *irrespective* of whether we crash the domain then and there, or
>>> whether we let it try and limp on with corrupted state.
>> Allowing the guest to continue with corrupted state is not a
>> useful thing to do, I agree. However, what falls under
>> "corrupted" seems to be different for you and me. I'd not call
>> delivery of #GP "corruption" in any way.
> 
> I can only repeat my previous statement:
> 
>> There are legal states where RIP is 0x0000800000000000 and #GP is the
>> wrong thing to do.
> 
> Blindly raising #GP in is not always the right thing to do.

Again - we're in agreement about "blindly". Let's be less blind.

>>  The primary goal ought
>> to be that we don't corrupt the guest kernel view of the world.
>> It may then have the opportunity to kill the offending user
>> mode process.
> 
> By the time we have hit this case, all bets are off, because Xen *is*
> malfunctioning.  We have no idea if kernel context is still intact.  You
> don't even know that current user context is the correct offending
> context to clobber, and might be creating a user=>user DoS vulnerability.
> 
> We definitely have an XSA to find and fix, and we can either make it
> very obvious and likely to be reported, or hidden and liable to go
> unnoticed for a long period of time.

Why would it go unnoticed when we log the incident? I very much
hope people inspect their logs at least every once in a while ...

And as per above - I disagree with your use of "definitely" here.
We have a bug to analyze and fix, yes. Whether it's an XSA-worthy
one isn't known ahead of time, unless we crash the guest in such
a case.

In any event I think it's about time that the VMX maintainers
voice their views here, as they're the ones to approve of
whichever change we end up with. Kevin, Jun?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.