|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xenstored crashes with SIGSEGV
On Tue, 2014-12-16 at 11:30 +0000, Frediano Ziglio wrote:
> 2014-12-16 11:06 GMT+00:00 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> > On Tue, 2014-12-16 at 10:45 +0000, Ian Campbell wrote:
> >> On Mon, 2014-12-15 at 23:29 +0100, Philipp Hahn wrote:
> >> > > I notice in your bugzilla (for a different occurrence, I think):
> >> > >> [2090451.721705] univention-conf[2512]: segfault at ff00000000 ip
> >> > >> 000000000045e238 sp 00007ffff68dfa30 error 6 in
> >> > >> python2.6[400000+21e000]
> >> > >
> >> > > Which appears to have faulted access 0xff000000000 too. It looks like
> >> > > this process is a python thing, it's nothing to do with xenstored I
> >> > > assume?
> >> >
> >> > Yes, that's one univention-config, which is completely independent of
> >> > xen(stored).
> >> >
> >> > > It seems rather coincidental that it should be accessing the
> >> > > same sort of address and be faulting.
> >> >
> >> > Yes, good catch. I'll have another look at those core dumps.
> >>
> >> With this in mind, please can you confirm what model of machines you've
> >> seen this on, and in particular whether they are all the same class of
> >> machine or whether they are significantly different.
> >>
> >> The reason being that randomly placed 0xff values in a field of 0x00
> >> could possibly indicate hardware (e.g. a GPU) DMAing over the wrong
> >> memory pages.
> >
> > Thanks for giving me access to the core files. This is very suspicious:
> > (gdb) frame 2
> > #2 0x000000000040a348 in tdb_open_ex (name=0x1941fb0
> > "/var/lib/xenstored/tdb.0x1935bb0", hash_size=<value optimized out>,
> > tdb_flags=0, open_flags=<value optimized out>, mode=<value optimized out>,
> > log_fn=0x4093b0 <null_log_fn>, hash_fn=<value optimized out>) at
> > tdb.c:1958
> > 1958 SAFE_FREE(tdb->locked);
> >
> > (gdb) x/96x tdb
> > 0x1921270: 0x00000000 0x00000000 0x00000000 0x00000000
> > 0x1921280: 0x0000001f 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921290: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212a0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212b0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212c0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212d0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212e0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x19212f0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921300: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921310: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921320: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921330: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921340: 0x00000000 0x00000000 0x0000ff00 0x000000ff
> > 0x1921350: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921360: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
> > 0x1921370: 0x004093b0 0x00000000 0x004092f0 0x00000000
> > 0x1921380: 0x00000002 0x00000000 0x00000091 0x00000000
> > 0x1921390: 0x0193de70 0x00000000 0x01963600 0x00000000
> > 0x19213a0: 0x00000000 0x00000000 0x0193fbb0 0x00000000
> > 0x19213b0: 0x00000000 0x00000000 0x00000000 0x00000000
> > 0x19213c0: 0x00405870 0x00000000 0x0040e3e0 0x00000000
> > 0x19213d0: 0x00000038 0x00000000 0xe814ec70 0x6f2f6567
> > 0x19213e0: 0x01963650 0x00000000 0x0193dec0 0x00000000
> >
> > Something has clearly done a number on the ram of this process.
> > 0x1921270 through 0x192136f is 256 bytes...
> >
> > Since it appears to be happening to other processes too I would hazard
> > that this is not a xenstored issue.
> >
> > Ian.
> >
>
> Good catch Ian!
>
> Strange corruption. Probably not related to xenstored as you
> suggested. I would be curious to see what's before the tdb pointer and
> where does the corruption starts.
(gdb) print tdb
$2 = (TDB_CONTEXT *) 0x1921270
(gdb) x/64x 0x1921200
0x1921200: 0x01921174 0x00000000 0x00000000 0x00000000
0x1921210: 0x01921174 0x00000000 0x00000171 0x00000000
0x1921220: 0x00000000 0x00000000 0x00000000 0x00000000
0x1921230: 0x01941f60 0x00000000 0x00000000 0x00000000
0x1921240: 0x00000000 0x00000000 0x00000000 0x6f630065
0x1921250: 0x00000000 0x00000000 0x0040e8a7 0x00000000
0x1921260: 0x00000118 0x00000000 0xe814ec70 0x00000000
0x1921270: 0x00000000 0x00000000 0x00000000 0x00000000
0x1921280: 0x0000001f 0x000000ff 0x0000ff00 0x000000ff
0x1921290: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212a0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212b0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212c0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212d0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212e0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
0x19212f0: 0x00000000 0x000000ff 0x0000ff00 0x000000ff
So it appears to start at 0x1921270 or maybe ...6c.
> I also don't understand where the
> "fd = 47" came from a previous mail. 0x1f is 31, not 47 (which is
> 0x2f).
I must have been using a different coredump to the origianl report
(there are several).
In the one which corresponds to the above:
(gdb) print *tdb
$3 = {name = 0x0, map_ptr = 0x0, fd = 31, map_size = 255,
read_only = 65280, locked = 0xff00000000, ecode = 65280, header = {
magic_food =
"\377\000\000\000\000\000\000\000\377\000\000\000\000\377\000\000\377\000\000\000\000\000\000\000\377\000\000\000\000\377\000",
version = 255, hash_size = 0, rwlocks = 255, reserved = {65280,
255, 0, 255, 65280, 255, 0, 255, 65280, 255, 0, 255, 65280,
255, 0, 255, 65280, 255, 0, 255, 65280, 255, 0, 255, 65280,
255, 0, 255, 65280, 255, 0}}, flags = 0, travlocks = {
next = 0xff0000ff00, off = 0, hash = 255}, next = 0xff0000ff00,
device = 1095216660480, inode = 1095216725760,
log_fn = 0x4093b0 <null_log_fn>,
hash_fn = 0x4092f0 <default_tdb_hash>, open_flags = 2}
(gdb) print/x *tdb
$4 = {name = 0x0, map_ptr = 0x0, fd = 0x1f, map_size = 0xff,
read_only = 0xff00, locked = 0xff00000000, ecode = 0xff00,
header = {magic_food = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0},
version = 0xff, hash_size = 0x0, rwlocks = 0xff, reserved = {
0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00,
0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0,
0xff, 0xff00, 0xff, 0x0, 0xff, 0xff00, 0xff, 0x0, 0xff,
0xff00, 0xff, 0x0}}, flags = 0x0, travlocks = {
next = 0xff0000ff00, off = 0x0, hash = 0xff},
next = 0xff0000ff00, device = 0xff00000000, inode = 0xff0000ff00,
log_fn = 0x4093b0, hash_fn = 0x4092f0, open_flags = 0x2}
which is consistent.
> I would not be surprised about a strange bug in libc or the kernel.
Or even Xen itself, or the h/w.
Ian,
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |