[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
Hello, On 06.01.2015 08:19, Philipp Hahn wrote: > On 19.12.2014 13:36, Philipp Hahn wrote: >> On 18.12.2014 11:17, Ian Campbell wrote: >>> On Tue, 2014-12-16 at 16:13 +0000, Frediano Ziglio wrote: >>>> Do we have a bug in Xen that affect SSE instructions (possibly already >>>> fixed after Philipp version) ? >>> >>> I've had a niggling feeling of Deja Vu over this which I'd been putting >>> down to an old Xen on ARM bug in the area of FPU register switching. >>> >>> But it seems at some point (possibly even still) there was a similar >>> issue with pvops kernels on x86, see: >>> http://bugs.xenproject.org/xen/bug/40 ... >>> Philipp, what kernel are you guys using? >> >> The crash "2014-12-06 01:26:21 xenstored[4337]" happened on linux-3.10.46. > > I looked through the changes of v3.10.46..v3.10.63 and found the > following patches: > | fb5b6e7 x86, fpu: shift drop_init_fpu() from save_xstate_sig() to > handle_signal() > | b888e3d x86, fpu: __restore_xstate_sig()->math_state_restore() needs > preempt_disable() > > They look interesting enough to may have fixed the bug, which could > explain the strange bit pattern caused by not restoring the FPU state > correctly. ... > we're now working on upgrading the dom0 kernel which should give use > usable core dumps again and may also fix the underlying problem. It that > bug ever happens again I'll keep you informed. We're now running 3.10.62 and the situation seems to have improved, but yesterday and today we got two crashes on different host - this time both times again in vsnprintf(): > [304534.173707] xenstored[3731]: segfault at 2 ip 00007f6da00805ad sp > 00007fff544a2b80 error 4 in libc-2.11.3.so[7f6da003b000+158000] > (gdb) where > #0 0x00007f6da00805ad in _IO_vfprintf_internal (s=0x7fff544a3230, > format=<value optimized out>, ap=0x7fff544a3790) at vfprintf.c:1617 > #1 0x00007f6da00a2452 in _IO_vsnprintf (string=0x7fff544a3390 "%%p > 4249828122762082015 03:11:04 9JT\377\177", maxlen=<value optimized out>, > format=0x40da48 "%s %p %04d%02d%02d %02d:%02d:%02d %s (", > args=0x7fff544a3790) at vsnprintf.c:120 > #2 0x00000000004029ad in trace (fmt=0x40da48 "%s %p %04d%02d%02d > %02d:%02d:%02d %s (") at xenstored_core.c:140 > #3 0x0000000000402c67 in trace_io (conn=0xbb51f0, data=0xbf1fe0, out=0) at > xenstored_core.c:174 > #4 0x00000000004041cd in handle_input (conn=0xbb51f0) at > xenstored_core.c:1307 > #5 0x0000000000405170 in main (argc=<value optimized out>, argv=<value > optimized out>) at xenstored_core.c:1964 The SSE register again contain the 00..ff.. pattern, but accessing %es:(%rdi)=0x0:0x2 looks very broken. > (gdb) info all-registers > rax 0x0 0 > rbx 0x40da48 4250184 > rcx 0xffffffffffffffff -1 > rdx 0x7fff544a3890 140734607538320 > rsi 0x40da69 4250217 > rdi 0x2 2 > rbp 0x7fff544a3790 0x7fff544a3790 > rsp 0x7fff544a3390 0x7fff544a3390 > r8 0x1 1 > r9 0x2 2 > r10 0x2 2 > r11 0x10 16 > r12 0x0 0 > r13 0x7fff544a3950 140734607538512 > r14 0x7fff544a39d0 140734607538640 > r15 0xc 12 > rip 0x4029ad 0x4029ad <trace+221> > eflags 0x10286 [ PF SF IF RF ] > cs 0xe033 57395 > ss 0xe02b 57387 > ds 0x0 0 > es 0x0 0 > fs 0x0 0 > gs 0x0 0 > st0 0 (raw 0x00000000000000000000) > st1 0 (raw 0x00000000000000000000) > st2 0 (raw 0x00000000000000000000) > st3 0 (raw 0x00000000000000000000) > st4 0 (raw 0x00000000000000000000) > st5 0 (raw 0x00000000000000000000) > st6 0 (raw 0x00000000000000000000) > st7 0 (raw 0x00000000000000000000) > fctrl 0x37f 895 > fstat 0x0 0 > ftag 0xffff 65535 > fiseg 0x0 0 > fioff 0x0 0 > foseg 0x0 0 > fooff 0x0 0 > fop 0x0 0 > xmm0 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, > 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0xff, 0x0, 0xff00, 0x0, 0x0, 0xff, 0x0, > 0x0}, v4_int32 = {0xff, 0xff00, 0xff0000, 0x0}, v2_int64 = {0xff00000000ff, > 0xff0000}, uint128 = 0x0000000000ff00000000ff00000000ff} > xmm1 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x25 <repeats 16 times>}, v8_int16 = {0x2525, 0x2525, 0x2525, > 0x2525, 0x2525, 0x2525, 0x2525, 0x2525}, v4_int32 = {0x25252525, 0x25252525, > 0x25252525, 0x25252525}, v2_int64 = {0x2525252525252525, 0x2525252525252525}, > uint128 = 0x25252525252525252525252525252525} > xmm2 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm3 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, > 0x8000000000000000}, v16_int8 = {0x0 <repeats 14 times>, 0xff, 0xff}, > v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff}, v4_int32 = {0x0, 0x0, > 0x0, 0xffff0000}, v2_int64 = {0x0, 0xffff000000000000}, uint128 = > 0xffff0000000000000000000000000000} > xmm4 {v4_float = {0xd34e4f00, 0x0, 0x0, 0x0}, v2_double = {0x0, > 0x8000000000000000}, v16_int8 = {0x4f, 0x4e, 0x53, 0x4f, 0x4c, 0x45, 0x3d, > 0x2f, 0x64, 0x65, 0x76, 0x2f, 0x63, 0x6f, 0x6e, 0x73}, v8_int16 = {0x4e4f, > 0x4f53, 0x454c, 0x2f3d, 0x6564, 0x2f76, 0x6f63, 0x736e}, v4_int32 = > {0x4f534e4f, 0x2f3d454c, 0x2f766564, 0x736e6f63}, v2_int64 = > {0x2f3d454c4f534e4f, 0x736e6f632f766564}, uint128 = > 0x736e6f632f7665642f3d454c4f534e4f} > xmm5 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm6 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm7 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm8 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm9 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm10 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm11 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm12 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm13 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm14 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > xmm15 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, > v16_int8 = {0x0 <repeats 16 times>}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, > uint128 = 0x00000000000000000000000000000000} > mxcsr 0x1f80 [ IM DM ZM OM UM PM ] > (gdb) x/20i $pc > 0x7f6da00805ad <_IO_vfprintf_internal+15357>: repnz scas %es:(%rdi),%al > 0x7f6da00805af <_IO_vfprintf_internal+15359>: xor %r10d,%r10d > 0x7f6da00805b2 <_IO_vfprintf_internal+15362>: not %rcx > 0x7f6da00805b5 <_IO_vfprintf_internal+15365>: lea -0x1(%rcx),%r8 > 0x7f6da00805b9 <_IO_vfprintf_internal+15369>: mov %r8d,%ecx > 0x7f6da00805bc <_IO_vfprintf_internal+15372>: jmpq 0x7f6da007e00c > <_IO_vfprintf_internal+5724> > 0x7f6da00805c1 <_IO_vfprintf_internal+15377>: mov $0x6,%ecx > 0x7f6da00805c6 <_IO_vfprintf_internal+15382>: xor %r10d,%r10d > 0x7f6da00805c9 <_IO_vfprintf_internal+15385>: mov $0x6,%r8d > 0x7f6da00805cf <_IO_vfprintf_internal+15391>: lea 0xdff57(%rip),%r9 > # 0x7f6da016052d <null> > 0x7f6da00805d6 <_IO_vfprintf_internal+15398>: jmpq 0x7f6da007d546 > <_IO_vfprintf_internal+2966> > 0x7f6da00805db <_IO_vfprintf_internal+15403>: mov 0x8(%r13),%rax > 0x7f6da00805df <_IO_vfprintf_internal+15407>: lea 0x8(%rax),%rdx > 0x7f6da00805e3 <_IO_vfprintf_internal+15411>: mov %rdx,0x8(%r13) > 0x7f6da00805e7 <_IO_vfprintf_internal+15415>: jmpq 0x7f6da007eac2 > <_IO_vfprintf_internal+8466> > 0x7f6da00805ec <_IO_vfprintf_internal+15420>: mov 0x8(%r13),%rax > 0x7f6da00805f0 <_IO_vfprintf_internal+15424>: lea 0x8(%rax),%rdx > 0x7f6da00805f4 <_IO_vfprintf_internal+15428>: mov %rdx,0x8(%r13) > 0x7f6da00805f8 <_IO_vfprintf_internal+15432>: jmpq 0x7f6da007f91e > <_IO_vfprintf_internal+12142> > 0x7f6da00805fd <_IO_vfprintf_internal+15437>: mov 0x8(%r13),%rax > (gdb) x/64x $sp > 0x7fff544a2b80: 0x544a3260 0x00007fff 0x00000001 0x00000000 > 0x7fff544a2b90: 0x0040da6a 0x00000000 0x0040da6a 0x00000000 > 0x7fff544a2ba0: 0x544a3260 0x00007fff 0xa007cb39 0x00007f6d > 0x7fff544a2bb0: 0x00000025 0x00000000 0x00000000 0x00000000 > 0x7fff544a2bc0: 0x544a3110 0x00007fff 0x0040d500 0x00000000 > 0x7fff544a2bd0: 0x0040da48 0x00000000 0x00000000 0x00000000 > 0x7fff544a2be0: 0x00000027 0x00000000 0x544a317c 0x00007fff > 0x7fff544a2bf0: 0x544a31b8 0x00007fff 0x544a3198 0x00007fff > 0x7fff544a2c00: 0x00000000 0x00000000 0x00000000 0x00000000 > 0x7fff544a2c10: 0x544a2d00 0x00007fff 0x544a31ac 0x00007fff > 0x7fff544a2c20: 0x544a31e8 0x00007fff 0x544a31c8 0x00000000 > 0x7fff544a2c30: 0x544a3170 0x00007fff 0xffffffff 0xffffffff > 0x7fff544a2c40: 0x544a2d30 0x00007fff 0x544a30e8 0x00007fff > 0x7fff544a2c50: 0x0040da70 0x00000000 0x00000000 0x00000000 > 0x7fff544a2c60: 0x00000000 0xffffe938 0xffffff20 0xffffffff > 0x7fff544a2c70: 0x544a3238 0x00007fff 0x544a3118 0x00007fff To me it looks like there is still some register/memory corruption happening in the kernel or Xen hypervisor. @Oleg: Have you seen any other corruption or is one of your patches likely to fix something like the issue mentioned above: > $ git l1 --grep fpu v3.10.. -- arch/x86 > c7b228a Merge branch 'x86-fpu-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > dc56c0f x86, fpu: Shift "fpu_counter = 0" from copy_thread() to > arch_dup_task_struct() > 5e23fee x86, fpu: copy_process: Sanitize fpu->last_cpu initialization > f185350 x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math() > 31d9633 x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu() Philipp _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |