On Tue, Jul 26, 2011 at 5:10 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Tue, Jul 26, 2011 at 4:48 PM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
>> On 26/07/2011 20:08, "Andrew Lutomirski" <luto@xxxxxxx> wrote:
>>
>>> On Tue, Jul 26, 2011 at 11:32 AM, Konrad Rzeszutek Wilk
>>> <konrad.wilk@xxxxxxxxxx> wrote:
>>>> On Mon, Jul 25, 2011 at 09:50:30PM -0400, Andrew Lutomirski wrote:
>>>>> After staring at the Xen assembly code with vague comprehension, I
>>>>> think I can sort of understand what's going on.
>>>>
>>>> Ok.
>>>>>
>>>>> Can you run this little program on a working kernel and tell me what
>>>>> it says (built as 64-bit and as 32-bit (with -m32)):
>>>>
>>>> 32-bit:
>>>> [konrad@f13-x86-build ~]$ ./check
>>>> cs = 73
>>>> [konrad@f13-x86-build ~]$ uname -a
>>>> Linux f13-x86-build.dumpdata.com 3.0.0 #1 SMP PREEMPT Tue Jul 26 09:56:38 EDT
>>>> 2011 i686 i686 i386 GNU/Linux
>>>>
>>>>
>>>> 64-bit:
>>>>
>>>> [konrad@f13-amd64-build ~]$ ./check
>>>> cs = e033
>>>
>>> My best guess is that each task starts out with standard __USER_CS,
>>> but the code in write_stack_trampoline (in the hypervisor) tells the
>>> kernel that CS is 0xe033 and then the next return to userspace makes
>>> it true.
>>
>> Yes, that's right.
>
> But it's still weird, because AFAICT
xen_sysret64 already does the
> right thing. ÂSo presumably the failure case only happens when
> something prevents sysret from working, like CONFIG_AUDITSYSCALL.
I lied. I still don't see what's going on.
Xen, in enlighten.c, registers xen_syscall_target as the 64-bit
syscall target (or at least I assume that's what CALLBACKTYPE_syscall
does).
xen_syscall_target does this:
.macro undo_xen_syscall
mov 0*8(%rsp), %rcx
mov 1*8(%rsp), %r11
mov 5*8(%rsp), %rsp
.endm
/* Normal 64-bit system call target */
ENTRY(xen_syscall_target)
undo_xen_syscall
jmp system_call_after_swapgs
ENDPROC(xen_syscall_target)
So the 0xe033 that Xen writes is popped back off the kernel stack and ignored.
xen_sysret64 explicitly pushes __USER_CS as its CS value, so that path looks OK.
If we go into the iret patch
(via auditing, for example), then the
FIXUP_TOP_OF_STACK macro does movq $__USER_CS,CS+\offset(%rsp), which
(unless it's buggy) writes __USER_CS into the appropriate spot.
So I don't see what part of the entry path needs patching.
--Andy