[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility



Ian Jackson wrote:
> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD 
> flexibility"):
>   
>> BTW, I only see the crash when the save/restore script is running.  I
>> stopped the other scripts and domains, running only save/restore on a
>> single domain, and see the crash rather quickly (within 10 iterations).
>>     
>
> I'll look at the libvirt code, but:
>
> With a recurring timeout, how can you ever know it's cancelled ?
> There might be threads out there, which don't hold any locks, which
> are in the process of executing a callback for a timeout.  That might
> be arbitrarily delayed from the pov of the rest of the program.
>
> E.g.:
>
>  Thread A                                             Thread B
>
>    invoke some libxl operation
> X    do some libxl stuff
> X    register timeout (libxl)
> XV     record timeout info
> X    do some more libxl stuff
>      ...
> X    do some more libxl stuff
> X    deregister timeout (libxl internal)
> X     converted to request immediate timeout
> XV     record new timeout info
> X      release libvirt event loop lock
>                                             entering libvirt event loop
>                                        V     observe timeout is immediate
>                                        V      need to do callback
>                                                call libxl driver
>
>       entering libvirt event loop
>  V     observe timeout is immediate
>  V      need to do callback
>          call libxl driver
>            call libxl
>   X          libxl sees timeout is live
>   X          libxl does libxl stuff
>          libxl driver deregisters
>  V         record lack of timeout
>          free driver's timeout struct
>                                                call libxl
>                                       X          libxl sees timeout is dead
>                                       X          libxl does nothing
>                                              libxl driver deregisters
>                                        V       CRASH due to deregistering
>                                        V        already-deregistered timeout
>
> If this is how things are, then I think there is no sane way to use
> libvirt's timeouts (!)
>   

Looking at libvirt's default event loop impl, and the current libxl
driver code, I think this is how things are :-/.  But maybe you have
just described a bug in the libxl driver.  In the timer callback,
libxlDomainObjPrivate is locked, the timeout is disabled in libvirt
event loop, libxlDomainObjPrivate is unlocked, and
libxl_osevent_occurred_timeout is called.  Could the issue be solved by
checking if the timeout is still valid in the callback, while holding a
lock on libxlDomainObjPrivate?  The first thread running the callback
could mark the timeout invalid before releasing the lock and calling
libxl_osevent_occurred_timeout.  After acquiring the
libxlDomainObjPrivate lock, subsequent threads running the callback
would see the timer is invalid and return.

Regards,
Jim


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.