[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility
Jim Fehlig wrote: > Ian Jackson wrote: > >> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD >> flexibility"): >> >> >>> I let this run over the weekend and today noticed libvirtd was deadlocked >>> >>> >> I have just retested xl with: >> * my 3-patch 4.4 fixes series >> * v2 of my fork series >> * the extra mutex patch "libxl: fork: Fixup SIGCHLD sharing" >> * "13/12" and "14/12" just posted >> and it WFM. >> >> Of course I don't have the same setup as Jim. >> >> Jim: if it's not too much trouble, I'd appreciate it if you could try >> that combination. >> >> For your convenience you can find a git branch of it at >> >> http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=shortlog;h=refs/tags/wip.enumerate-pids-v2.1 >> aka >> git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1 >> >> > > I've been testing this branch and notice an occasional libvirtd segfault > that always occurs when calling libxl_domain_create_restore(). By > occasional, I mean my save/restore script might cause the segfault after > 2 iterations, or 20 iterations, or ... But the segfault always occurs > in libxl_domain_create_restore() > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffeef59700 (LWP 12083)] > 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f, > klass=0x5555558a1310) > at util/virobject.c:362 > 362 return virClassIsDerivedFrom(obj->klass, klass); > (gdb) bt > #0 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f, > klass=0x5555558a1310) > at util/virobject.c:362 > #1 0x00007ffff745765b in virObjectLock (anyobj=0x2f302f6e69616d6f) at > util/virobject.c:314 > #2 0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook > (priv=0x5555558fc310, > hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302 > #3 0x00007fffe96f8fed in time_deregister (gc=0x7fffeef58220, > ev=0x5555559eee48) > at libxl_event.c:294 > #4 0x00007fffe96facfd in afterpoll_internal (egc=0x7fffeef58220, > poller=0x5555559a4c70, nfds=3, > fds=0x5555559c09d0, now=...) at libxl_event.c:1008 > #5 0x00007fffe96fc312 in eventloop_iteration (egc=0x7fffeef58220, > poller=0x5555559a4c70) > at libxl_event.c:1455 > #6 0x00007fffe96fce58 in libxl__ao_inprogress (ao=0x5555559e9690, > file=0x7fffe970fadb "libxl_create.c", line=1356, > func=0x7fffe97105f0 <__func__.16344> "do_domain_create") at > libxl_event.c:1700 > #7 0x00007fffe96d711f in do_domain_create (ctx=0x5555559d9fa0, > d_config=0x7fffeef58490, > domid=0x7fffeef5840c, restore_fd=89, checkpointed_stream=0, > ao_how=0x0, aop_console_how=0x0) > at libxl_create.c:1356 > #8 0x00007fffe96d7238 in libxl_domain_create_restore > (ctx=0x5555559d9fa0, d_config=0x7fffeef58490, > domid=0x7fffeef5840c, restore_fd=89, params=0x7fffeef58400, > ao_how=0x0, aop_console_how=0x0) > at libxl_create.c:1387 > #... > (gdb) f 2 > #2 0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook > (priv=0x5555558fc310, > hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302 > 302 virObjectLock(info->priv); > (gdb) p info->priv > $3 = (libxlDomainObjPrivatePtr) 0x2f302f6e69616d6f > (gdb) f 9 > #9 0x00007fffe993f2c7 in libxlVmStart (driver=0x5555558c2e50, > vm=0x5555558e6a50, > start_paused=false, restore_fd=89) at libxl/libxl_driver.c:635 > 635 res = libxl_domain_create_restore(priv->ctx, &d_config, > &domid, > (gdb) p priv > $2 = (libxlDomainObjPrivatePtr) 0x5555558fc310 > > It looks like the libxlDomainObjPrivatePtr, stashed as part of > for_app_registration_out when registering the timeout, has been > trampled. Not sure if the problem is in libvirt or libxl, but it is > late here and I'm calling it a night :). > It appears the timeout_modify callback is invoked on a previously deregistered timeout. I didn't notice the segfault when running libvirtd under valgrind, but did see ==14653== Invalid read of size 8 ==14653== at 0x134ACD1C: libxlDomainObjTimeoutModifyEventHook (libxl_domain.c:309) ==14653== by 0x13730FEC: time_deregister (libxl_event.c:294) ==14653== by 0x13732CFC: afterpoll_internal (libxl_event.c:1008) ==14653== by 0x13734311: eventloop_iteration (libxl_event.c:1455) ==14653== by 0x13734E57: libxl__ao_inprogress (libxl_event.c:1700) ==14653== by 0x1370F11E: do_domain_create (libxl_create.c:1356) ==14653== by 0x1370F237: libxl_domain_create_restore (libxl_create.c:1387) ==14653== by 0x134AF332: libxlVmStart (libxl_driver.c:635) ==14653== by 0x134B382A: libxlDomainRestoreFlags (libxl_driver.c:2047) ==14653== by 0x134B3975: libxlDomainRestore (libxl_driver.c:2070) ==14653== by 0x53B5AC7: virDomainRestore (libvirt.c:2678) ==14653== by 0x130ADC: remoteDispatchDomainRestore (remote_dispatch.h:6657) ==14653== Address 0x18000178 is 8 bytes inside a block of size 32 free'd ==14653== at 0x4C28ADC: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==14653== by 0x529B08F: virFree (viralloc.c:580) ==14653== by 0x134AC578: libxlDomainObjEventHookInfoFree (libxl_domain.c:110) ==14653== by 0x52BE3DB: virEventPollCleanupTimeouts (vireventpoll.c:535) ==14653== by 0x52BEA4C: virEventPollRunOnce (vireventpoll.c:651) ==14653== by 0x52BC960: virEventRunDefaultImpl (virevent.c:306) which is consistent with the gdb findings. I've audited the timeout handling code in libvirt and didn't notice any problems. I'll have some time tomorrow to continue poking. Regards, Jim _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |