Re: [Xen-devel] Fwd: NetBSD xl core-dump not working... Memory fault (core dumped)

On 08/11/13 11:29, Ian Campbell wrote:
> On Thu, 2013-11-07 at 21:04 +0000, Miguel C. wrote:
>> yes its 4.2 from pkgsrc.
> Thanks, that might be enough.
>>  how can i get the changeset id?
> that'd be one for the port-xen folks I think. It might be printed in the
> xen dmesg, but that depends on how it was built and 4.2 may be too old
> to have such functionalilty.
>> Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
>>> On Mon, 2013-11-04 at 22:13 +0000, Mike C. wrote:
>>>> On 31.10.13 04:34, Miguel Clara wrote:
>>>>> I was trying to get a core-dump for a domU with xl and got this
>>> error:
>>>>> # xl dump-core 20 test.core
>>>>> Memory fault
>>>>> GDB shows this:
>>>>> a# gdb xl xl.core
>>>>> GNU gdb (GDB) 7.3.1
>>>>> Copyright (C) 2011 Free Software Foundation, Inc.
>>>>> License GPLv3+: GNU GPL version 3 or
>>> later<http://gnu.org/licenses/gpl.html>
>>>>> This is free software: you are free to change and redistribute it.
>>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>> copying"
>>>>> and "show warranty" for details.
>>>>> This GDB was configured as "x86_64--netbsd".
>>>>> For bug reporting instructions, please see:
>>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>>> Reading symbols from /usr/sbin/xl...done.
>>>>> [New process 1]
>>>>> Core was generated by `xl'.
>>>>> Program terminated with signal 11, Segmentation fault.
>>>>> #0  0x00007f7ff7007b45 in xc_domain_dumpcore_via_callback
>>>>> (xch=0x7f7ff7b0d800, domid=20, args=0x7f7fffffdae0,
>>>>> dump_rtn=0x7f7ff700632c<local_file_dump>)
>>>>>      at xc_core.c:860
> In 4.2.0 this corresponds to
>  memcpy(dump_mem, vaddr, PAGE_SIZE);
> which is a plausible source of a segfault.
> xc_core.c has only changed in immaterial ways (although ways which
> caused all the line numbers to shift) since 4.2.0 AFAICT so it is likely
> that this bug is still present.
> Can you tell via gdb what the faulting address was and whether it
> corresponds to dump_mem or vaddr? gdb's "info locals" might give you at
> least some of that? Also you can use disas to identify the precise
> instruction at 0x00007f7ff7007b45, which will show you the registers
> which might lead you to the faulting address.
> vaddr is certainly not NULL, it's checked right before. It could be
> non-NULL and still invalid if xc_map_foreign_range were buggy on NetBSD,
> but that is surely used elsewhere? I suppose it might have mapped an MFN
> which was either invalid (or became invalid, but your bug is
> deterministic, right?. IIRC NetBSD's privcmd foreign mappings are
> populated lazily and not immediately like on Linux? If that were the
> case (and I'm only vaguely aware of how NetBSD operates) then it would
> be plausible that xc_map_foreign_range would succeed but that a
> subsequent attempt to access the region would fault?

Yes, NetBSD privcmd maps the region lazily (it does the actual map on
the page fault handler for that region). I have not tested it, but could
you give a try to the following patch:


It's quite old, but I expect there hasn't been many changes in NetBSD
privcmd recently.


