[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks



On 28.10.2020 22:31, Andrew Cooper wrote:
> On 26/10/2020 09:40, Jan Beulich wrote:
>> In the case that no 64-bit SYSCALL callback is registered, the guest
>> will be crashed when 64-bit userspace executes a SYSCALL instruction,
>> which would be a userspace => kernel DoS.  Similarly for 32-bit
>> userspace when no 32-bit SYSCALL callback was registered either.
>>
>> This has been the case ever since the introduction of 64bit PV support,
>> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
>> yield #GP/#UD in userspace before the callback is registered, and are
>> therefore safe by default.
>>
>> This change does constitute a change in the PV ABI, for the corner case
>> of a PV guest kernel not registering a 64-bit callback (which has to be
>> considered a defacto requirement of the unwritten PV ABI, considering
>> there is no PV equivalent of EFER.SCE).
>>
>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
>> SYSENTER (safe by default, until explicitly enabled).
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Signed-off-by: Jan Beulich <JBeulich@xxxxxxxx>
>> ---
>> v3:
>>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>>    callbacks", to allow the uncontroversial part of that change to go
>>    in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>>    over just the REX prefix too much trickery even for an unlikely to be
>>    taken code path?)
> 
> I find this submission confusion seeing as my v3 is already suitably
> acked and ready to commit.  What I haven't had yet enough free time to
> do so.

My objection to the other half stands, and hence, I'm afraid, stands
in the way of your patch getting committed. Aiui Roger's ack doesn't
invalidate my objection, sorry.

>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -33,11 +33,27 @@ ENTRY(switch_to_kernel)
>>          cmoveq VCPU_syscall32_addr(%rbx),%rax
>>          testq %rax,%rax
>>          cmovzq VCPU_syscall_addr(%rbx),%rax
>> -        movq  %rax,TRAPBOUNCE_eip(%rdx)
>>          /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
>>          btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
>>          setc  %cl
>>          leal  (,%rcx,TBF_INTERRUPT),%ecx
>> +
>> +        test  %rax, %rax
>> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
>> +        mov   VCPU_trap_ctxt(%rbx), %rdi
>> +        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
>> +        cmpw  $FLAT_USER_CS32, UREGS_cs(%rsp)
>> +        je    0f
>> +        rex64                           # subl => subq
>> +0:
>> +        subl  $2, UREGS_rip(%rsp)
> 
> There was deliberately not a 32bit sub here (see below).

Funny you should say this when what I've taken as input (v2; I
don't think I had seen a v3, or else I would have called this
one v4) had "subl", not "subq". Roger's comment was regarding
a "mov" with a 32-bit register destination, not this "sub". If
you had also noticed and fixed this one, without you having
posted v3 (or without me having seen the posting) I couldn't
have known.

> As to the construct, I'm having a hard time deciding whether this is an
> excellent idea, or far too clever for its own good.
> 
> Some basic perf testing shows that there is a visible difference in
> execution time of these two paths, which means there is some
> optimisation being missed in the frontend for the 32bit case.  However,
> the difference is tiny in the grand scheme of things (about 0.4%
> difference in aggregate time to execute a loop of this pattern with a
> fixed number of iterations.)
> 
> However, the 32bit case isn't actually interesting here.  A guest can't
> execute a SYSCALL instruction on/across the 4G->0 boundary because the
> M2P is mapped NX up to the 4G boundary, so we can never reach this point
> with %eip < 2.
> 
> Therefore, the 64bit-only form is the appropriate one to use, which
> solves any question of cleverness, or potential decode stalls it causes.

Good point.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.