[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)



On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:
> yes its 4.2 from pkgsrc.

Thanks, that might be enough.

>  how can i get the changeset id?

that'd be one for the port-xen folks I think. It might be printed in the
xen dmesg, but that depends on how it was built and 4.2 may be too old
to have such functionalilty.

> Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> >On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
> >> On 31.10.13 04:34, Miguel Clara wrote:
> >> 
> >> > I was trying to get a core-dump for a domU with xl and got this
> >error:
> >> >
> >> > # xl dump-core 20 test.core
> >> > Memory fault
> >> >
> >> > GDB shows this:
> >> >
> >> > a# gdb xl xl.core
> >> > GNU gdb (GDB) 7.3.1
> >> > Copyright (C) 2011 Free Software Foundation, Inc.
> >> > License GPLv3+: GNU GPL version 3 or
> >later<http://gnu.org/licenses/gpl.html>
> >> > This is free software: you are free to change and redistribute it.
> >> > There is NO WARRANTY, to the extent permitted by law.  Type "show
> >copying"
> >> > and "show warranty" for details.
> >> > This GDB was configured as "x86_64--netbsd".
> >> > For bug reporting instructions, please see:
> >> > <http://www.gnu.org/software/gdb/bugs/>...
> >> > Reading symbols from /usr/sbin/xl...done.
> >> > [New process 1]
> >> > Core was generated by `xl'.
> >> > Program terminated with signal 11, Segmentation fault.
> >> > #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
> >> > (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
> >> > dump_rtn=0x7f7ff700632c<local_file_dump>)
> >> >      at xc_core.c:860
> >

In 4.2.0 this corresponds to
 memcpy(dump_mem, vaddr, PAGE_SIZE);
which is a plausible source of a segfault.

xc_core.c has only changed in immaterial ways (although ways which
caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely
that this bug is still present.

Can you tell via gdb what the faulting address was and whether it
corresponds to dump_mem or vaddr? gdb's "info locals" might give you at
least some of that? Also you can use disas to identify the precise
instruction at 0x00007f7ff7007b45, which will show you the registers
which might lead you to the faulting address.

vaddr is certainly not NULL, it's checked right before. It could be
non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
but that is surely used elsewhere? I suppose it might have mapped an MFN
which was either invalid (or became invalid, but your bug is
deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
populated lazily and not immediately like on Linux? If that were the
case (and I'm only vaguely aware of how NetBSD operates) then it would
be plausible that xc_map_foreign_range would succeed but that a
subsequent attempt to access the region would fault?

dump_mem isn't NULL, it's a pointer into the dump_mem_start array which
has a check for failure when it is allocated. Since dump_mem is just
normal process memory and vaddr is a magic foreign mapping I'd be
inclined to suspect vaddr was not right in some way...

Does "xl -vvv core-dump" give any useful additional logging?

Unfortunately I don't think anyone has done valgrind support for
debugging processes which use Xen hypercalls for *BSD (if you were very
keen you could probably follow what was done for Linux
http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/
and wire up the BSD privcmd ioctl to the generic Xen hypercall code I
added)

I fear this bug is going to take someone on the ground with a NetBSD
system and the ability to dive into BSD kernel internals to get to the
bottom of...

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.