|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
> Hello Ian,
>
> On 15.12.2014 18:45, Ian Campbell wrote:
> > On Mon, 2014-12-15 at 14:50 +0000, Ian Campbell wrote:
> >> On Mon, 2014-12-15 at 15:19 +0100, Philipp Hahn wrote:
> >>> I just noticed something strange:
> >>>
> >>>> #3 0x000000000040a684 in tdb_open (name=0xff00000000 <Address
> >>>> 0xff00000000 out of bounds>, hash_size=0,
> >>>> tdb_flags=4254928, open_flags=-1, mode=3119127560) at tdb.c:1773
> ...
> > I'm reasonably convinced now that this is just a weird artefact of
> > running gdb on an optimised binary, probably a shortcoming in the debug
> > info leading to gdb getting confused.
> >
> > Unfortunately this also calls into doubt the parameter to talloc_free,
> > perhaps in that context 0xff0000000 is a similar artefact.
> >
> > Please can you print the entire contents of tdb in the second frame
> > ("print *tdb" ought to do it). I'm curious whether it is all sane or
> > not.
>
> (gdb) print *tdb
> $1 = {name = 0x0, map_ptr = 0x0, fd = 47, map_size = 65280, read_only =
> 16711680,
> locked = 0xff0000000000,
So it really does seem to be 0xff0000000000 in memory.
> flags = 0,
> travlocks = {
> next = 0xff0000, off = 0, hash = 65280}, next = 0xff0000,
> device = 280375465082880, inode = 16711680, log_fn = 0x4093b0
> <null_log_fn>,
> hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}
And here we can see tdb->{flags,open_flags} == 0 and 2, contrary to what
the stack trace says we were called with, which was nonsense. Since 0
and 2 are sensible and correspond to what the caller passes I think the
stack trace is just confused.
> (gdb) info registers
> rax 0x0 0
> rbx 0x16bff70 23854960
> rcx 0xffffffffffffffff -1
> rdx 0x40ecd0 4254928
> rsi 0x0 0
> rdi 0xff0000000000 280375465082880
And here it is in the registers.
> rbp 0x7fcaed6c96a8 0x7fcaed6c96a8
> rsp 0x7fff9dc86330 0x7fff9dc86330
> r8 0x7fcaece54c08 140509534571528
> r9 0xff00000000000000 -72057594037927936
> r10 0x7fcaed08c14c 140509536895308
> r11 0x246 582
> r12 0xd 13
> r13 0xff0000000000 280375465082880
And again.
> r14 0x4093b0 4232112
> r15 0x167d620 23582240
> rip 0x4075c4 0x4075c4 <talloc_chunk_from_ptr+4>
This must be the faulting address.
> eflags 0x10206 [ PF IF RF ]
> cs 0x33 51
> ss 0x2b 43
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
> fctrl 0x0 0
> fstat 0x0 0
> ftag 0x0 0
> fiseg 0x0 0
> fioff 0x0 0
> foseg 0x0 0
> fooff 0x0 0
> fop 0x0 0
> mxcsr 0x0 [ ]
>
> (gdb) disassemble
> Dump of assembler code for function talloc_chunk_from_ptr:
> 0x00000000004075c0 <talloc_chunk_from_ptr+0>: sub $0x8,%rsp
> 0x00000000004075c4 <talloc_chunk_from_ptr+4>: mov -0x8(%rdi),%edx
This is the line corresponding to %rip above which is doing a read via %
rdi, which is 0xff0000000000.
It's reading tc->flags. It's been optimised, tc = pp - SIZE, so it is
loading *(pp-SIZE+offsetof(flags)), which is pp-8 (flags is the last
field in the struct).
So rdi contains pp which == the ptr given as an argument to the
function, so ptr was bogus.
So it seems we really do have tdb->locked containing 0xff0000000000.
This is only allocated in one place which is:
tdb->locked = talloc_zero_array(tdb, struct tdb_lock_type,
tdb->header.hash_size+1);
midway through tdb_open_ex. It might be worth inserting a check+log for
this returning 0xff, 0xff00, 0xff0000 ... 0xff0000000000 etc.
> 0x00000000004075c7 <talloc_chunk_from_ptr+7>: lea -0x50(%rdi),%rax
This is actually calculating tc, ready for return upon success.
> 0x00000000004075cb <talloc_chunk_from_ptr+11>: mov %edx,%ecx
> 0x00000000004075cd <talloc_chunk_from_ptr+13>: and
> $0xfffffffffffffff0,%ecx
> 0x00000000004075d0 <talloc_chunk_from_ptr+16>: cmp $0xe814ec70,%ecx
> 0x00000000004075d6 <talloc_chunk_from_ptr+22>: jne 0x4075e2
> <talloc_chunk_from_ptr+34>
(tc->flags & ~0xF) != TALLOC_MAGIC
> 0x00000000004075d8 <talloc_chunk_from_ptr+24>: and $0x1,%edx
> 0x00000000004075db <talloc_chunk_from_ptr+27>: jne 0x4075e2
> <talloc_chunk_from_ptr+34>
tc->flags & TALLOC_FLAG_FREE
> 0x00000000004075dd <talloc_chunk_from_ptr+29>: add $0x8,%rsp
> 0x00000000004075e1 <talloc_chunk_from_ptr+33>: retq
Success, return.
> 0x00000000004075e2 <talloc_chunk_from_ptr+34>: nopw 0x0(%rax,%rax,1)
> 0x00000000004075e8 <talloc_chunk_from_ptr+40>: callq 0x401b98 <abort@plt>
The two TALLOC_ABORTS both end up here if the checks above fail.
> > Can you also "p $_siginfo._sifields._sigfault.si_addr" (in frame 0).
> > This ought to be the actual faulting address, which ought to give a hint
> > on how much we can trust the parameters in the stack trace.
>
> Hmm, my gdb refused to access $_siginfo:
> (gdb) show convenience
> $_siginfo = Unable to read siginfo
That's ok, I think I've convinced myself above what the crash is.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |