[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] Debugging XAPI daemon crash


  • To: 'Ranjeet R' <rranjeet@xxxxxxxxxxx>, Dave Scott <Dave.Scott@xxxxxxxxxx>
  • From: Rob Hoes <Rob.Hoes@xxxxxxxxxx>
  • Date: Wed, 2 Apr 2014 12:29:22 +0000
  • Accept-language: en-GB, en-US
  • Cc: "xen-api@xxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxx>
  • Delivery-date: Wed, 02 Apr 2014 12:30:03 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>
  • Thread-index: Ac9FeZIEEGyvK3fqS5+yO9kiSTP+fwAhAf+AAFGC/oAAAI9/AABWxcIAABGwNkAAFgq6AAAYbmrwARvJxIAAF5XOAA==
  • Thread-topic: [Xen-API] Debugging XAPI daemon crash

Thanks!
I have merged your patch.

Cheers,
Rob

> -----Original Message-----
> From: Ranjeet R [mailto:rranjeet@xxxxxxxxxxx]
> Sent: 02 April 2014 4:14 AM
> To: Rob Hoes; Dave Scott
> Cc: xen-api@xxxxxxxxxxxxx
> Subject: RE: [Xen-API] Debugging XAPI daemon crash
> 
> Thanks Rob
> 
> I have generated a pull request as you had mentioned.  Let me know if you
> have any review comments.
> 
> -Ranjeet
> 
> -----Original Message-----
> From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx]
> Sent: Thursday, March 27, 2014 3:16 AM
> To: Ranjeet R; Dave Scott
> Cc: xen-api@xxxxxxxxxxxxx
> Subject: RE: [Xen-API] Debugging XAPI daemon crash
> 
> Hi Ranjeet,
> 
> That makes sense.
> 
> I guess the same would apply to the ifa_netmask field. The following lines
> would blow up if it is NULL:
> 
>       netmask = tmp->ifa_netmask;
>       [...]
>               netmaskstr = alloc_addr(netmask);
> 
> because alloc_addr will try to access netmask->sa_family.
> 
> So to be on the safe side, I think we should check for this as well. I
> think defensive coding is the right way (I hate segfaults)!
> 
> For the purpose of the stub_if_addr function, I think it is sufficient to
> wrap the existing if-block with "if (sock && netmask)". This assumes that
> we always want both the address and the netmask, and we ignore the
> interface if either is undefined.
> 
> The master branch for this code is here (since we split it off from xen-
> api-libs): https://github.com/xapi-
> project/netdev/blob/master/lib/addr_stubs.c. If you'd like to submit a
> pull request there (as well as keeping the fix in your development branch
> on clearwater), that would be great.
> 
> Thanks,
> Rob
> 
> > -----Original Message-----
> > From: Ranjeet R [mailto:rranjeet@xxxxxxxxxxx]
> > Sent: 26 March 2014 11:08 PM
> > To: Rob Hoes; Dave Scott
> > Cc: xen-api@xxxxxxxxxxxxx
> > Subject: RE: [Xen-API] Debugging XAPI daemon crash
> >
> > Hello Rob/Dave
> >
> > Thanks for the pointers. I figured out the issue. The reason my C stub
> > was able to list out all interfaces without crashing is -
> >
> > if (getifaddrs(&ifaddr) == -1) {
> >        print ("getifaddr failed");
> >         exit(1);
> > }
> >
> >   struct ifaddrs *ifa = ifaddr;
> >   for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
> > --->>>>    if (ifa->ifa_addr != NULL) {    ------>> Check for ifa_addr
> >       int family = ifa->ifa_addr->sa_family;
> >
> > I was only looking into the ifaddrs structure only when the interface
> > addr is set.
> >
> > In the stub_if_getaddr code, the code is as follows
> >
> > ret = getifaddrs(&ifaddrs);
> > if (ret < 0)
> >       caml_failwith("cannot get interface address");
> >
> > for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
> >       sock = tmp->ifa_addr;  ------------------------------>Assigned
> here
> >      netmask = tmp->ifa_netmask;
> >
> >      if (sock->sa_family == AF_INET || sock->sa_family == AF_INET6)
> > {     -------------> Dereferenced here without checking
> >                       name = caml_copy_string(tmp->ifa_name); <snip>
> >
> > In my case, there were two internal interfaces for which the interface
> > address was not setup and while iterating through the list, there was
> > a NULL pointer dereference.
> >
> > It might look like defensive coding but can we ignore the interfaces
> > for which the ifa_addr is not set. I can open up a bug and fix it if
> > there is consenus that this needs to be fixed.
> >
> > Ranjeet
> >
> >
> > -----Original Message-----
> > From: Rob Hoes [mailto:Rob.Hoes@xxxxxxxxxx]
> > Sent: Wednesday, March 26, 2014 4:44 AM
> > To: Ranjeet R; Dave Scott
> > Cc: xen-api@xxxxxxxxxxxxx
> > Subject: RE: [Xen-API] Debugging XAPI daemon crash
> >
> > Hi Ranjeet,
> >
> > > It seems to be crashing in the same point as you had mentioned.
> > > Please find the SEGV backtrace attached.
> > >
> > > (gdb) c
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x085bc2d6 in stub_if_getaddr ()
> > >  (gdb) bt
> > > #0  0x085cca90 in segv_handler ()
> > > #1  <signal handler called>
> > > #2  0x085bc2d6 in stub_if_getaddr ()
> > > #3  0x0850ef8c in camlNetdev__get_all_ipv4_1325 ()
> > >
> > > You had mentioned that this could be because of a bad C function
> binding.
> > > I wrote a small C stub to see whether it works for the xenbr0
> > > interface and it seems to be working fine. How should I verify the
> > binding.
> >
> > The function that is failing seems to be this one:
> > https://github.com/xapi-project/xen-api-
> > libs/blob/clearwater/netdev/addr_stubs.c#L74
> >
> > It has:
> >
> >     int ret;
> >     struct ifaddrs *ifaddrs, *tmp;
> >     [...]
> >     ret = getifaddrs(&ifaddrs);
> >     [...]
> >     for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
> >         sock = tmp->ifa_addr;
> >         netmask = tmp->ifa_netmask;
> >         [...]
> >
> > Could it be that the getifaddrs function does not set ifaddrs correctly?
> > You should be able to test this with a small C program. Or is this
> > what you have already done?
> >
> > Cheers,
> > Rob
> >
> > > Appreciate your help.
> > >
> > > -Ranjeet
> > >
> > > -----Original Message-----
> > > From: David Scott [mailto:dave.scott@xxxxxxxxxxxxx]
> > > Sent: Monday, March 24, 2014 3:46 AM
> > > To: Ranjeet R
> > > Cc: xen-api@xxxxxxxxxxxxx
> > > Subject: Re: [Xen-API] Debugging XAPI daemon crash
> > >
> > > On 24/03/14 10:30, Ranjeet R wrote:
> > > > Hello Dave
> > > >
> > > > The binaries did not have debug symbols but I managed to rebuild
> > > > the
> > > binaries with debug enabled.
> > >
> > > Great.
> > >
> > > > I tried starting the xapi process as it was started in the init.d
> > > scripts under gdb. However, in gdb, the xapi process forks another
> > > process and I am not able to debug it further (I tried setting
> > > detach_on_fork to off in gdb, but the primary process just goes to
> > > end
> > of execution).
> > > >
> > > > I am using the following gdb command to debug
> > > >
> > > > gdb --args /usr/sbin/xapi -daemon -writeinitcomplete
> > > /var/run/xapi_init_complete.cookie -writereadyfile
> > > /var/run/xapi_startup.cookie -onsystemboot"
> > > >
> > > > Can you please help me in the steps that you use in debugging the
> > > > XAPI
> > > process.
> > >
> > > Ah, I think xapi forks a "watchdog" process near the start -- this
> > > is probably what you're seeing.
> > >
> > > Try adding a "-nowatchdog" option to the command-line.
> > >
> > > Dave
> > >
> > > >
> > > > Thanks for your help,
> > > >
> > > > -Ranjeet
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Dave Scott [mailto:Dave.Scott@xxxxxxxxxx]
> > > > Sent: Saturday, March 22, 2014 12:36 PM
> > > > To: Ranjeet R
> > > > Cc: xen-api@xxxxxxxxxxxxx
> > > > Subject: Re: [Xen-API] Debugging XAPI daemon crash
> > > >
> > > > Hi,
> > > >
> > > > I suspect the segfault is being caused by a bad C function binding.
> > > > I've
> > > seen a similar crash before when querying an interface IP via
> > > getifaddrs (I think that was the function name) Could you run xapi
> > > in gdb and reproduce the crash? Printing the call stack would help
> > > to confirm this hypothesis. Provided the xapi binary still has debug
> > > symbols (ie hasn't been stripped) the ocaml functions (with fairly
> > > obvious mangled names) should also be on the stack too.
> > > >
> > > > Cheers,
> > > > Dave
> > > >
> > > >> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@xxxxxxxxxxx>
> > wrote:
> > > >>
> > > >> Hello all
> > > >>
> > > >> I am trying to bring a DevCloud setup which has an XCP Kronos
> > > >> based
> > > XAPI daemon. I had changed the underlying network implementation (it
> > > is not a bridge, but an openvswitch-like network implementation) and
> > > the XAPI daemon crashes during bootup. Please find the XAPI logs below.
> > > >>
> > > >>
> > > >> starting up database engine D:72969b3eaf8e|redo_log] Flushing
> > > >> database to all active redo-logs starting up database engine
> > > >> D:72969b3eaf8e|xapi] About to flush database:
> > > >> /var/lib/xcp/state.db starting up database engine
> > > >> D:72969b3eaf8e|redo_log] Flushing database to all active
> > > >> redo-logs starting up database engine D:72969b3eaf8e|xapi]
> > > >> Performing initial DB GC thread_zero|dbsync
> > > >> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost
> > > >> dbsync
> > > >> (update_env) D:fd0aec7399c9|dbsync] creating localhost
> > > >>
> > > >> dmesg logs seem to suggest that xapi is crashing during startup.
> > > >>
> > > >> [    9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30
> > error
> > > 4 in xapi[8048000+59f000]
> > > >> [    9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450
> > error
> > > 4 in xapi[8048000+59f000]
> > > >>
> > > >> I looked the XAPI code to see where it fails and I don't see any
> > > >> logs after the following code point in ocaml / xapi /
> > > >> dbsync_slave.ml
> > > >>
> > > >> let create_localhost ~__context info =
> > > >>    let ip = get_my_ip_addr ~__context in
> > > >>
> > > >> I confirmed to see that "ifconfig xenbr0" has a valid management
> > > >> IP
> > > address and should not fail.
> > > >>
> > > >> How do I debug this crash further. Are there any ways to look at
> > > >> the
> > > stack trace where XAPI crashed. Any pointers to debug this further
> > > will be very helpful.
> > > >>
> > > >> -Ranjeet
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Xen-api mailing list
> > > >> Xen-api@xxxxxxxxxxxxx
> > > >> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Xen-api mailing list
> > > Xen-api@xxxxxxxxxxxxx
> > > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> >
> >
> 
> 
> 


_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.