[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"): > BTW, I only see the crash when the save/restore script is running. I > stopped the other scripts and domains, running only save/restore on a > single domain, and see the crash rather quickly (within 10 iterations). I'll look at the libvirt code, but: With a recurring timeout, how can you ever know it's cancelled ? There might be threads out there, which don't hold any locks, which are in the process of executing a callback for a timeout. That might be arbitrarily delayed from the pov of the rest of the program. E.g.: Thread A Thread B invoke some libxl operation X do some libxl stuff X register timeout (libxl) XV record timeout info X do some more libxl stuff ... X do some more libxl stuff X deregister timeout (libxl internal) X converted to request immediate timeout XV record new timeout info X release libvirt event loop lock entering libvirt event loop V observe timeout is immediate V need to do callback call libxl driver entering libvirt event loop V observe timeout is immediate V need to do callback call libxl driver call libxl X libxl sees timeout is live X libxl does libxl stuff libxl driver deregisters V record lack of timeout free driver's timeout struct call libxl X libxl sees timeout is dead X libxl does nothing libxl driver deregisters V CRASH due to deregistering V already-deregistered timeout If this is how things are, then I think there is no sane way to use libvirt's timeouts (!) In principle I guess the driver could keep its per-timeout structs around forever and remember whether they've been deregistered or not. Each one would have to have a lock in it. But if you think about it, if you have 10 threads all running the event loop and you set a timeout to zero, doesn't that mean that every thread's event loop should do the timeout callback as fast as it can ? That could be a lot of wasted effort. The best solution would appear to be to provide a non-recurring callback. > I'm not so thrilled with the timeout handling code in the libvirt libxl > driver. The driver maintains a list of active timeouts because IIRC, > there were cases when the driver received timeout deregistrations when > calling libxl_ctx_free, at which point some of the associated structures > were freed. The idea was to call libxl_osevent_occurred_timeout on any > active timeouts before freeing libxlDomainObjPrivate and its contents. libxl does deregister fd callbacks in libxl_ctx_free. But libxl doesn't currently "deregister" any timeouts in libxl_ctx_free; indeed it would be a bit daft for it to do so as at libxl_ctx_free there are no aos running so there would be nothing to time out. But there is a difficulty with timeouts which libxl has set to occur immediately but which have not yet actually had the callback. The the application cannot call libxl_ctx_free with such timeouts outstanding, because that would imply later calling back into libxl with a stale ctx. (Looking at the code I see that the "nexi" are never actually freed. Bah.) Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |