[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility



Jim Fehlig wrote:
> Ian Jackson wrote:
>   
>> libvirt reaps its children synchronously and has no central pid
>> registry and no dispatch mechanism.  libxl does have a pid registry so
>> can provide a selective reaping facility, but that is not currently
>> exposed.  Here we expose it.
>>
>> Also, libvirt has multiple libxl ctxs.  Prior to this series it is not
>> possible for those to share SIGCHLD: libxl expects either the
>> application, or _one_ libxl ctx, to own SIGCHLD.  In the final patch
>> of this series we relax this restriction by having libxl maintain a
>> process-wide list of the libxl ctxs that are supposed to be interested
>> in SIGCHLD.
>>
>> I have not tested the selective reaping functionality.  The most
>> plausible test environment for that is a suitably modified libvirt.
>>   
>>     
>
> I've been testing this series (plus 1/3 in your "tools: Miscellanous
> fixes for 4.4" series) on a suitably modified libvirt and the results
> look good so far :).
>
> I'm running four scripts concurrently that
>
> - start / stop domA
> - save / restore domB
> - reboot domC
> - get stats on dom{A,B,C}
>
> They have been running for about an hour now, and I haven't noticed any
> problems
>   

I let this run over the weekend and today noticed libvirtd was deadlocked

Thread 5 (Thread 0x7ffff10ea700 (LWP 42142)):
#0  0x00007ffff4d20b7d in read () from /lib64/libpthread.so.0
#1  0x00007fffeb88d028 in libxl__self_pipe_eatall (fd=39) at
libxl_event.c:1369
#2  0x00007fffeb88f52c in sigchld_selfpipe_handler (egc=0x7ffff10e9270,
ev=0x5555559986e8, fd=39,
    events=1, revents=1) at libxl_fork.c:501
#3  0x00007fffeb88bbf5 in afterpoll_internal (egc=0x7ffff10e9270,
poller=0x5555559a2b40, nfds=3,
    fds=0x5555558d96a0, now=...) at libxl_event.c:990
#4  0x00007fffeb88d2d2 in eventloop_iteration (egc=0x7ffff10e9270,
poller=0x5555559a2b40)
    at libxl_event.c:1431
#5  0x00007fffeb88de18 in libxl__ao_inprogress (ao=0x5555559beb30,
    file=0x7fffeb8a0a1b "libxl_create.c", line=1356,
    func=0x7fffeb8a1530 <__func__.16339> "do_domain_create") at
libxl_event.c:1676
#6  0x00007fffeb86813f in do_domain_create (ctx=0x555555998550,
d_config=0x7ffff10e94d0,
    domid=0x7ffff10e944c, restore_fd=-1, checkpointed_stream=0,
ao_how=0x0, aop_console_how=0x0)
    at libxl_create.c:1356
#7  0x00007fffeb86820d in libxl_domain_create_new (ctx=0x555555998550,
d_config=0x7ffff10e94d0,
    domid=0x7ffff10e944c, ao_how=0x0, aop_console_how=0x0) at
libxl_create.c:1377
#8  0x00007fffebad01b6 in libxlVmStart (driver=0x5555558b7be0,
vm=0x5555558d3280,
    start_paused=false, restore_fd=-1) at libxl/libxl_driver.c:630
#9  0x00007fffebad7594 in libxlDomainCreateWithFlags
(dom=0x5555559b9c00, flags=0)
    at libxl/libxl_driver.c:2924
#...

Thread 1 (Thread 0x7ffff7fc7840 (LWP 42135)):
#0  0x00007ffff4d2089c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff4d1c4f2 in _L_lock_957 () from /lib64/libpthread.so.0
#2  0x00007ffff4d1c35a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fffeb88943a in libxl__ctx_lock (ctx=0x555555998550) at
libxl_internal.h:2760
#4  0x00007fffeb88bf3d in libxl_osevent_occurred_fd (ctx=0x555555998550,
    for_libxl=0x5555559953e0, fd=45, events_ign=0, revents_ign=1) at
libxl_event.c:1049
#5  0x00007fffebacd56c in libxlDomainObjFDEventCallback (watch=40,
fd=45, vir_events=1,
    fd_info=0x5555559b5b80) at libxl/libxl_domain.c:132
#...

It looks like libxl is waiting for a read with a ctx locked on thread 5,
then receives an occurred_fd event on the same ctx in thread 1.  But it
is not clear to me why read() is blocking...

Regards,
Jim



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.