[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] FW: Cancelling asynchronous operations in libxl

On Fri, 2013-11-08 at 18:38 +0000, Ian Jackson wrote:
> I've been thinking about this some more and looking at the code.
> I have the following sketch of an approach:
>  * Somehow the ao_how API is changed to make it possible to return the
>    ao to the caller (in the case of an asynchronous ao).
>    (NB that there could be exciting races in the application between
>    completing the ao and cancelling it; this means that the
>    application can only use cancellation if it uses the callback
>    mechanism and must make sure that the callback takes a lock and
>    then makes changes to its data structurs which prevent the ao being
>    cancelled.  As an alternative we could invent a separate
>    "cancellation handle" which can be detached from the ao but which
>    must always be explicitly destroyed by the application.)

That sounds like the sort of cleanup operation which callers would be
pretty good at forgetting to do ;-) Or maybe it would be handled by
generic code in the callers and it would be fine.

>    Suggestions for the API welcome.  The most obvious approach simply
>    adds a new field to libxl_asyncop_how but that risks lots of
>    existing code failing to initialise it.

Should we have supplied libxl_asyncop_how_init functions? It's perhaps
not too late to do so.

The following 6 things are all internal to libxl, right?

>  * Keep a list of cancellation hooks in the ao.  Anything which is
>    using this ao can add itself to that set of hooks.
>  * Cancellation involves repeatedly taking the front of that list off,
>    and calling the hook.  (After the ao has been cancelled, its
>    completion still needs to be awaited by the application, but it
>    will hopefully complete earlier and return ERROR_CANCEL.)
>  * The timeout registration facility is changed to take an ao and
>    register a cancellation hook.  It is changed to provide an rc value
>    to its callback, which will be FAIL or CANCEL.
>  * The fork machinery is changed to take an ao, and register a
>    cancellation hook.  A suitable-for-default-uses cancellation hook
>    function is provided which sends SIGKILL to the child and makes a
>    note that this has happened.  The child death callback provides an
>    rc value (0 for status==0, or FAIL or CANCEL) for the convenience
>    of the next layer up.
>  * A new version of the xswatch event registration machinery is
>    provided which takes an ao, registers itself as cancellation hooks,
>    and provides an rc value to its callbacks.  This new facility could
>    usefully do an xs_read on a predefined path.  The rc value will be
>    OK or CANCEL.  (We need new versions of this because some xswatch
>    callers are part of the infrastructure or libxl application event
>    generation, not aos.)
>  * Anything which uses the fd machinery directly needs to do
>    cancellation itself (or ensure that it has a timeout, an xs watch,
>    or a child).

All sounds plausible to me.

> A tricky question arises regarding cleanup: for example, if
> libxl_domain_create_* were cancelled.  It would end up in
> domcreate_complete with rc==CANCEL.
> Should it now run the domain destruction ?  How would the caller say
> they wanted that cancelled, if that too was taking too long ?  Perhaps
> there should be a progress callback to say "we have finished
> cancelling the first thing and are now cleaning up".
> Or perhaps cancelling the operation should simply skip the destruction
> and return the domid to the caller.  (But also, fiddly edge case:
> consider what happens if a failed creation, which is already being
> destroyed, is cancelled.)

I think either approach could be made to work, the question is how much
complexity we are willing to put into libxl vs the applications for
this. On first glance it seems that by putting a small amount of
complexity into the apps (you must destroy on cancel) we avoid a
potentially large amount in the library.

It seems to also make sense to allow the callers to manage the
destruction as a separate cancellable event which they can manage
independently if they wish.

Probably we should ask the existing potential users (xapi & libvirt?)
what they would prefer.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.