[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/2] x86/pv: Rewrite segment context switching from scratch



On 04.09.2020 15:52, Andrew Cooper wrote:
> There are multiple bugs with the existing implementation.
> 
> On AMD CPUs prior to Zen2, loading a NUL segment selector doesn't clear the
> segment base, which is a problem for 64bit code which typically expects to use
> a NUL %fs/%gs selector.
> 
> On a context switch from any PV vcpu, to a 64bit PV vcpu with an %fs/%gs
> selector which faults, the fixup logic loads NUL, and the guest is entered at
> the failsafe callback with the stale base.
> 
> Alternatively, a PV context switch sequence of 64 (NUL, non-zero base) =>
> 32 (NUL) => 64 (NUL, zero base) will similarly cause Xen to enter the guest
> with a stale base.
> 
> Both of these corner cases manifest as state corruption in the final vcpu.
> However, damage is limited to to 64bit code expecting to use Thread Local
> Storage with a base pointer of 0, which doesn't occur by default.
> 
> The context switch logic is extremely complicated, and is attempting to
> optimise away loading a NUL selector (which is fast), or writing a 64bit base
> of 0 (which is rare).  Furthermore, it fails to respect Linux's ABI with
> userspace, which manifests as userspace state corruption as far as Linux is
> concerned.
> 
> Always restore all selector and base state, in all cases.
> 
> Leave a large comment explaining hardware behaviour, and the new ABI
> expectations.  Update the comments in the public headers.
> 
> Drop all "segment preloading" to handle the AMD corner case.  It was never
> anything but a waste of time for %ds/%es, and isn't needed now that %fs/%gs
> bases are unconditionally written for 64bit PV guests.  In load_segments(),
> store the result of is_pv_32bit_vcpu() as it is an expensive predicate now,
> and not used in a way which impacts speculative safety.
> 
> Reported-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Reported-by: Sarah Newman <srn@xxxxxxxxx>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Wei Liu <wl@xxxxxxx>
> CC: Andy Lutomirski <luto@xxxxxxxxxx>
> CC: Sarah Newman <srn@xxxxxxxxx>
> 
> v2:
>  * Some save_segments() content pulled out into an earlier patch.
>  * Extra fix in arch_set_info_guest() due to the new ABI adjustments.

Oh, yes, of course.

Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.