|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
Hello Ian,
On 15.12.2014 18:45, Ian Campbell wrote:
> On Mon, 2014-12-15 at 14:50 +0000, Ian Campbell wrote:
>> On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote:
>>> I just noticed something strange:
>>>
>>>> #3 0x000000000040a684 in tdb_open (name=0xff00000000 <Address
>>>> 0xff00000000 out of bounds>, hash_size=0,
>>>> tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
...
> I'm reasonably convinced now that this is just a weird artefact of
> running gdb on an optimised binary, probably a shortcoming in the debug
> info leading to gdb getting confused.
>
> Unfortunately this also calls into doubt the parameter to talloc_free,
> perhaps in that context 0xff0000000 is a similar artefact.
>
> Please can you print the entire contents of tdb in the second frame
> ("print *tdb" ought to do it). I'm curious whether it is all sane or
> not.
(gdb) print *tdb
$1 = {name = 0x0, map_ptr = 0x0, fd = 47, map_size = 65280, read_only =
16711680,
locked = 0xff0000000000, ecode = 16711680, header = {
magic_food =
"\000\000\000\000\000\000\000\000\000\377\000\000\000\000\377\000\000\000\000\000\000\000\000\000\000\377\000\000\000\000\377",
version = 0, hash_size = 0,
rwlocks = 65280, reserved = {16711680, 0, 0, 65280, 16711680, 0, 0,
65280,
16711680, 0, 0, 65280, 16711680, 0, 0, 65280, 16711680, 0, 0,
65280, 16711680,
0, 0, 65280, 16711680, 0, 0, 65280, 16711680, 0, 0}}, flags = 0,
travlocks = {
next = 0xff0000, off = 0, hash = 65280}, next = 0xff0000,
device = 280375465082880, inode = 16711680, log_fn = 0x4093b0
<null_log_fn>,
hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}
> Please can you also print "info regs" at the point of the segv (in frame
> 0) as well as "disas" at that point.
(gdb) info registers
rax 0x0 0
rbx 0x16bff70 23854960
rcx 0xffffffffffffffff -1
rdx 0x40ecd0 4254928
rsi 0x0 0
rdi 0xff0000000000 280375465082880
rbp 0x7fcaed6c96a8 0x7fcaed6c96a8
rsp 0x7fff9dc86330 0x7fff9dc86330
r8 0x7fcaece54c08 140509534571528
r9 0xff00000000000000 -72057594037927936
r10 0x7fcaed08c14c 140509536895308
r11 0x246 582
r12 0xd 13
r13 0xff0000000000 280375465082880
r14 0x4093b0 4232112
r15 0x167d620 23582240
rip 0x4075c4 0x4075c4 <talloc_chunk_from_ptr+4>
eflags 0x10206 [ PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fctrl 0x0 0
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
mxcsr 0x0 [ ]
(gdb) disassemble
Dump of assembler code for function talloc_chunk_from_ptr:
0x00000000004075c0 <talloc_chunk_from_ptr+0>: sub $0x8,%rsp
0x00000000004075c4 <talloc_chunk_from_ptr+4>: mov -0x8(%rdi),%edx
0x00000000004075c7 <talloc_chunk_from_ptr+7>: lea -0x50(%rdi),%rax
0x00000000004075cb <talloc_chunk_from_ptr+11>: mov %edx,%ecx
0x00000000004075cd <talloc_chunk_from_ptr+13>: and
$0xfffffffffffffff0,%ecx
0x00000000004075d0 <talloc_chunk_from_ptr+16>: cmp $0xe814ec70,%ecx
0x00000000004075d6 <talloc_chunk_from_ptr+22>: jne 0x4075e2
<talloc_chunk_from_ptr+34>
0x00000000004075d8 <talloc_chunk_from_ptr+24>: and $0x1,%edx
0x00000000004075db <talloc_chunk_from_ptr+27>: jne 0x4075e2
<talloc_chunk_from_ptr+34>
0x00000000004075dd <talloc_chunk_from_ptr+29>: add $0x8,%rsp
0x00000000004075e1 <talloc_chunk_from_ptr+33>: retq
0x00000000004075e2 <talloc_chunk_from_ptr+34>: nopw 0x0(%rax,%rax,1)
0x00000000004075e8 <talloc_chunk_from_ptr+40>: callq 0x401b98 <abort@plt>
> Can you also "p $_siginfo._sifields._sigfault.si_addr" (in frame 0).
> This ought to be the actual faulting address, which ought to give a hint
> on how much we can trust the parameters in the stack trace.
Hmm, my gdb refused to access $_siginfo:
(gdb) show convenience
$_siginfo = Unable to read siginfo
> Since I'm asking for the world I may as well ask you to dump the raw
> stack too "x/64x $sp" ought to be a good starting point.
(gdb) x/64x $sp
0x7fff9dc86330: 0xed6c96a8 0x00007fca 0x00407edf 0x00000000
0x7fff9dc86340: 0x00000000 0x00000000 0x016bff70 0x00000000
0x7fff9dc86350: 0xed6c96a8 0x00007fca 0x0000000d 0x00000000
0x7fff9dc86360: 0x00000000 0x00000000 0x004093b0 0x00000000
0x7fff9dc86370: 0x0167d620 0x00000000 0x0040a348 0x00000000
0x7fff9dc86380: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fff9dc86390: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fff9dc863a0: 0x00000011 0x00000000 0x411d4816 0x00000000
0x7fff9dc863b0: 0x00000001 0x00000000 0x000081a0 0x00000000
0x7fff9dc863c0: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fff9dc863d0: 0x00096000 0x00000000 0x00001000 0x00000000
0x7fff9dc863e0: 0x000004b0 0x00000000 0x5438ba01 0x00000000
0x7fff9dc863f0: 0x07fd332e 0x00000000 0x5438ba01 0x00000000
0x7fff9dc86400: 0x07fd332e 0x00000000 0x5438ba01 0x00000000
0x7fff9dc86410: 0x07fd332e 0x00000000 0x00000000 0x00000000
0x7fff9dc86420: 0x00000000 0x00000000 0x00000000 0x00000000
> I notice in your bugzilla (for a different occurrence, I think):
>> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip
>> 000000000045e238 sp 00007ffff68dfa30 error 6 in python2.6[400000+21e000]
>
> Which appears to have faulted access 0xff000000000 too. It looks like
> this process is a python thing, it's nothing to do with xenstored I
> assume?
Yes, that's one univention-config, which is completely independent of
xen(stored).
> It seems rather coincidental that it should be accessing the
> same sort of address and be faulting.
Yes, good catch. I'll have another look at those core dumps.
> Ian.
Thank you for your help.
Philipp Hahn
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |