[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Andy Lutomirski <luto@xxxxxxxxxx>
Date: Thu, 6 Dec 2018 10:49:53 -0800
Cc: Juergen Gross <jgross@xxxxxxxx>, Denys Vlasenko <dvlasenk@xxxxxxxxxx>, Sarah Newman <srn@xxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, "M. Vefa Bicakci" <m.v.b@xxxxxxxxxx>, Brian Gerst <brgerst@xxxxxxxxx>, Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, stable <stable@xxxxxxxxxxxxxxx>, Andrew Lutomirski <luto@xxxxxxxxxx>, Josh Poimboeuf <jpoimboe@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, David Woodhouse <dwmw2@xxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>
Delivery-date: Thu, 06 Dec 2018 18:50:15 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

> On Dec 6, 2018, at 9:36 AM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
>> On 06/12/2018 17:10, David Woodhouse wrote:
>> On Wed, 2018-11-28 at 08:44 -0800, Andy Lutomirski wrote:
>>>> Can we assume it's always from kernel? The Xen code definitely seems to
>>>> handle invoking this from both kernel and userspace contexts.
>>> I learned that my comment here was wrong shortly after the patch landed :(
>> Turns out the only place I see it getting called from is under
>> __context_switch().
>>
>> #7 [ffff8801144a7cf0] new_xen_failsafe_callback at ffffffffa028028a 
>> [kmod_ebxfix]
>> #8 [ffff8801144a7d90] xen_hypercall_update_descriptor at ffffffff8100114a
>> #9 [ffff8801144a7db8] xen_hypercall_update_descriptor at ffffffff8100114a
>> #10 [ffff8801144a7df0] xen_mc_flush at ffffffff81006ab9
>> #11 [ffff8801144a7e30] xen_end_context_switch at ffffffff81004e12
>> #12 [ffff8801144a7e48] __switch_to at ffffffff81016582
>> #13 [ffff8801144a7ea0] __schedule at ffffffff815d2b37
>>
>> That …114a in xen_hypercall_update_descriptor is the 'pop' instruction
>> right after the syscall; it's happening when Xen is preempting the
>> domain in the hypercall and then reloads the segment registers to run
>> that vCPU again later.
>>
>> [  44185.225289]   WARN: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 0000000abbd76060
>>
>> The update_descriptor hypercall args (rdi, rsi) were 0xabbd76060 and 0
>> respectively — it was setting a descriptor at that address, to zero.
>>
>> Xen then failed to load the selector 0x63 into the %gs register (since
>> that descriptor has just been wiped?), leaving it zero.
>>
>> [  44185.225256]   WARN: xen_failsafe_callback from 
>> xen_hypercall_update_descriptor+0xa/0x40
>> [  44185.225263]   WARN: DS: 2b/2b ES: 2b/2b FS: 0/0 GS:0/63
>>
>> This is on context switch from a 32-bit task to idle. So
>> xen_failsafe_callback is returning to the "faulting" instruction, with
>> a comment saying "Retry the IRET", but in fact is just continuing on
>> its merry way with %gs unexpectedly set to zero.
>>
>> In fact I think this is probably fine in practice, since it's about to
>> get explicitly set a few lines further down in __context_switch(). But
>> it's odd enough, and far enough away from what's actually said by the
>> comments, that I'm utterly unsure.
>>
>> In xen_load_tls() we explicitly only do the lazy_load_gs(0) for the
>> 32-bit kernel. Is that really right?
>
> Basically - what is happening is that xen_load_tls() is invalidating the
> %gs selector while %gs is still non-NUL.
>
> If this happens to intersect with a vcpu reschedule, %gs (being non-NUL)
> takes precedence over KERNGSBASE, and faults when Xen tries to reload
> it.  This results in the failsafe callback being invoked.
>
> I think the correct course of action is to use xen_load_gs_index(0)
> (poorly named - it is a hypercall which does swapgs; mov to %gs; swapgs)
> before using update_descriptor() to invalidate the segment.
>
> That will reset %gs to 0 without touching KERNGSBASE, and can be queued
> in the same multicall as the update_descriptor() hypercall.

Sounds good to me as long as we skip it on native.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
  - From: David Woodhouse

References:
- Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
  - From: David Woodhouse
- Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH 6/9] x86/amd: Allocate resources to cope with LS_CFG being per-core on Fam17h
Next by Date: Re: [Xen-devel] [PATCH 7/9] x86/amd: Support context switching legacy SSBD interface
Previous by thread: Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
Next by thread: Re: [Xen-devel] Patch "x86/entry/64: Remove %ebx handling from error_entry/exit" has been added to the 4.9-stable tree
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.