[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenstored crashes with SIGSEGV



Hello,

On 06.01.2015 08:19, Philipp Hahn wrote:
> On 19.12.2014 13:36, Philipp Hahn wrote:
>> On 18.12.2014 11:17, Ian Campbell wrote:
>>> On Tue, 2014-12-16 at 16:13 +0000, Frediano Ziglio wrote:
>>>> Do we have a bug in Xen that affect SSE instructions (possibly already
>>>> fixed after Philipp version) ?
>>>
>>> I've had a niggling feeling of Deja Vu over this which I'd been putting
>>> down to an old Xen on ARM bug in the area of FPU register switching.
>>>
>>> But it seems at some point (possibly even still) there was a similar
>>> issue with pvops kernels on x86, see:
>>>         http://bugs.xenproject.org/xen/bug/40
...
>>> Philipp, what kernel are you guys using?
>>
>> The crash "2014-12-06 01:26:21 xenstored[4337]" happened on linux-3.10.46.
> 
> I looked through the changes of v3.10.46..v3.10.63 and found the
> following patches:
> | fb5b6e7 x86, fpu: shift drop_init_fpu() from save_xstate_sig() to
> handle_signal()
> | b888e3d x86, fpu: __restore_xstate_sig()->math_state_restore() needs
> preempt_disable()
> 
> They look interesting enough to may have fixed the bug, which could
> explain the strange bit pattern caused by not restoring the FPU state
> correctly.
...
> we're now working on upgrading the dom0 kernel which should give use
> usable core dumps again and may also fix the underlying problem. It that
> bug ever happens again I'll keep you informed.

We're now running 3.10.62 and the situation seems to have improved, but
yesterday and today we got two crashes on different host - this time
both times again in vsnprintf():

> [304534.173707] xenstored[3731]: segfault at 2 ip 00007f6da00805ad sp 
> 00007fff544a2b80 error 4 in libc-2.11.3.so[7f6da003b000+158000]

> (gdb) where
> #0  0x00007f6da00805ad in _IO_vfprintf_internal (s=0x7fff544a3230, 
> format=<value optimized out>, ap=0x7fff544a3790) at vfprintf.c:1617
> #1  0x00007f6da00a2452 in _IO_vsnprintf (string=0x7fff544a3390 "%%p 
> 4249828122762082015 03:11:04 9JT\377\177", maxlen=<value optimized out>, 
> format=0x40da48 "%s %p %04d%02d%02d %02d:%02d:%02d %s (", 
> args=0x7fff544a3790) at vsnprintf.c:120
> #2  0x00000000004029ad in trace (fmt=0x40da48 "%s %p %04d%02d%02d 
> %02d:%02d:%02d %s (") at xenstored_core.c:140
> #3  0x0000000000402c67 in trace_io (conn=0xbb51f0, data=0xbf1fe0, out=0) at 
> xenstored_core.c:174
> #4  0x00000000004041cd in handle_input (conn=0xbb51f0) at 
> xenstored_core.c:1307
> #5  0x0000000000405170 in main (argc=<value optimized out>, argv=<value 
> optimized out>) at xenstored_core.c:1964

The SSE register again contain the 00..ff.. pattern, but accessing
%es:(%rdi)=0x0:0x2 looks very broken.

> (gdb) info all-registers 
> rax            0x0      0
> rbx            0x40da48 4250184
> rcx            0xffffffffffffffff       -1
> rdx            0x7fff544a3890   140734607538320
> rsi            0x40da69 4250217
> rdi            0x2      2
> rbp            0x7fff544a3790   0x7fff544a3790
> rsp            0x7fff544a3390   0x7fff544a3390
> r8             0x1      1
> r9             0x2      2
> r10            0x2      2
> r11            0x10     16
> r12            0x0      0
> r13            0x7fff544a3950   140734607538512
> r14            0x7fff544a39d0   140734607538640
> r15            0xc      12
> rip            0x4029ad 0x4029ad <trace+221>
> eflags         0x10286  [ PF SF IF RF ]
> cs             0xe033   57395
> ss             0xe02b   57387
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x0      0
> st0            0        (raw 0x00000000000000000000)
> st1            0        (raw 0x00000000000000000000)
> st2            0        (raw 0x00000000000000000000)
> st3            0        (raw 0x00000000000000000000)
> st4            0        (raw 0x00000000000000000000)
> st5            0        (raw 0x00000000000000000000)
> st6            0        (raw 0x00000000000000000000)
> st7            0        (raw 0x00000000000000000000)
> fctrl          0x37f    895
> fstat          0x0      0
> ftag           0xffff   65535
> fiseg          0x0      0
> fioff          0x0      0
> foseg          0x0      0
> fooff          0x0      0
> fop            0x0      0
> xmm0           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 
> 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xff, 0x0, 0xff00, 0x0, 0x0, 0xff, 0x0, 
> 0x0}, v4_int32 = {0xff, 0xff00, 0xff0000, 0x0}, v2_int64 = {0xff00000000ff, 
> 0xff0000}, uint128 = 0x0000000000ff00000000ff00000000ff}
> xmm1           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x25 <repeats 16 times>}, v8_int16 = {0x2525, 0x2525, 0x2525, 
> 0x2525, 0x2525, 0x2525, 0x2525, 0x2525}, v4_int32 = {0x25252525, 0x25252525, 
> 0x25252525, 0x25252525}, v2_int64 = {0x2525252525252525, 0x2525252525252525}, 
> uint128 = 0x25252525252525252525252525252525}
> xmm2           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm3           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 
> 0x8000000000000000}, v16_int8 = {0x0 <repeats 14 times>, 0xff, 0xff}, 
> v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff}, v4_int32 = {0x0, 0x0, 
> 0x0, 0xffff0000}, v2_int64 = {0x0, 0xffff000000000000}, uint128 = 
> 0xffff0000000000000000000000000000}
> xmm4           {v4_float = {0xd34e4f00, 0x0, 0x0, 0x0}, v2_double = {0x0, 
> 0x8000000000000000}, v16_int8 = {0x4f, 0x4e, 0x53, 0x4f, 0x4c, 0x45, 0x3d, 
> 0x2f, 0x64, 0x65, 0x76, 0x2f, 0x63, 0x6f, 0x6e, 0x73}, v8_int16 = {0x4e4f, 
> 0x4f53, 0x454c, 0x2f3d, 0x6564, 0x2f76, 0x6f63, 0x736e}, v4_int32 = 
> {0x4f534e4f, 0x2f3d454c, 0x2f766564, 0x736e6f63}, v2_int64 = 
> {0x2f3d454c4f534e4f, 0x736e6f632f766564}, uint128 = 
> 0x736e6f632f7665642f3d454c4f534e4f}
> xmm5           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm6           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm7           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm8           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm9           {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm10          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm11          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm12          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm13          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm14          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> xmm15          {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, 
> v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
> 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, 
> uint128 = 0x00000000000000000000000000000000}
> mxcsr          0x1f80   [ IM DM ZM OM UM PM ]

> (gdb) x/20i $pc
> 0x7f6da00805ad <_IO_vfprintf_internal+15357>:   repnz scas %es:(%rdi),%al
> 0x7f6da00805af <_IO_vfprintf_internal+15359>:   xor    %r10d,%r10d
> 0x7f6da00805b2 <_IO_vfprintf_internal+15362>:   not    %rcx
> 0x7f6da00805b5 <_IO_vfprintf_internal+15365>:   lea    -0x1(%rcx),%r8
> 0x7f6da00805b9 <_IO_vfprintf_internal+15369>:   mov    %r8d,%ecx
> 0x7f6da00805bc <_IO_vfprintf_internal+15372>:   jmpq   0x7f6da007e00c 
> <_IO_vfprintf_internal+5724>
> 0x7f6da00805c1 <_IO_vfprintf_internal+15377>:   mov    $0x6,%ecx
> 0x7f6da00805c6 <_IO_vfprintf_internal+15382>:   xor    %r10d,%r10d
> 0x7f6da00805c9 <_IO_vfprintf_internal+15385>:   mov    $0x6,%r8d
> 0x7f6da00805cf <_IO_vfprintf_internal+15391>:   lea    0xdff57(%rip),%r9      
>   # 0x7f6da016052d <null>
> 0x7f6da00805d6 <_IO_vfprintf_internal+15398>:   jmpq   0x7f6da007d546 
> <_IO_vfprintf_internal+2966>
> 0x7f6da00805db <_IO_vfprintf_internal+15403>:   mov    0x8(%r13),%rax
> 0x7f6da00805df <_IO_vfprintf_internal+15407>:   lea    0x8(%rax),%rdx
> 0x7f6da00805e3 <_IO_vfprintf_internal+15411>:   mov    %rdx,0x8(%r13)
> 0x7f6da00805e7 <_IO_vfprintf_internal+15415>:   jmpq   0x7f6da007eac2 
> <_IO_vfprintf_internal+8466>
> 0x7f6da00805ec <_IO_vfprintf_internal+15420>:   mov    0x8(%r13),%rax
> 0x7f6da00805f0 <_IO_vfprintf_internal+15424>:   lea    0x8(%rax),%rdx
> 0x7f6da00805f4 <_IO_vfprintf_internal+15428>:   mov    %rdx,0x8(%r13)
> 0x7f6da00805f8 <_IO_vfprintf_internal+15432>:   jmpq   0x7f6da007f91e 
> <_IO_vfprintf_internal+12142>
> 0x7f6da00805fd <_IO_vfprintf_internal+15437>:   mov    0x8(%r13),%rax

> (gdb) x/64x $sp
> 0x7fff544a2b80: 0x544a3260      0x00007fff      0x00000001      0x00000000
> 0x7fff544a2b90: 0x0040da6a      0x00000000      0x0040da6a      0x00000000
> 0x7fff544a2ba0: 0x544a3260      0x00007fff      0xa007cb39      0x00007f6d
> 0x7fff544a2bb0: 0x00000025      0x00000000      0x00000000      0x00000000
> 0x7fff544a2bc0: 0x544a3110      0x00007fff      0x0040d500      0x00000000
> 0x7fff544a2bd0: 0x0040da48      0x00000000      0x00000000      0x00000000
> 0x7fff544a2be0: 0x00000027      0x00000000      0x544a317c      0x00007fff
> 0x7fff544a2bf0: 0x544a31b8      0x00007fff      0x544a3198      0x00007fff
> 0x7fff544a2c00: 0x00000000      0x00000000      0x00000000      0x00000000
> 0x7fff544a2c10: 0x544a2d00      0x00007fff      0x544a31ac      0x00007fff
> 0x7fff544a2c20: 0x544a31e8      0x00007fff      0x544a31c8      0x00000000
> 0x7fff544a2c30: 0x544a3170      0x00007fff      0xffffffff      0xffffffff
> 0x7fff544a2c40: 0x544a2d30      0x00007fff      0x544a30e8      0x00007fff
> 0x7fff544a2c50: 0x0040da70      0x00000000      0x00000000      0x00000000
> 0x7fff544a2c60: 0x00000000      0xffffe938      0xffffff20      0xffffffff
> 0x7fff544a2c70: 0x544a3238      0x00007fff      0x544a3118      0x00007fff

To me it looks like there is still some register/memory corruption
happening in the kernel or Xen hypervisor.

@Oleg:
Have you seen any other corruption or is one of your patches likely to
fix something like the issue mentioned above:
> $ git l1 --grep fpu v3.10.. -- arch/x86
> c7b228a Merge branch 'x86-fpu-for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> dc56c0f x86, fpu: Shift "fpu_counter = 0" from copy_thread() to 
> arch_dup_task_struct()
> 5e23fee x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
> f185350 x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
> 31d9633 x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()

Philipp

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.