[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/9] libxl: Asynchronous/long-running operation infrastructure



On Tue, 2012-01-24 at 17:27 +0000, Ian Jackson wrote:
> Thanks for the thorough review.
> 
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 6/9] libxl: 
> Asynchronous/long-running operation infrastructure"):
> > You've done device removal, do you have a list of other things which
> > should use this? (perhaps with an associated list of people's names...)
> 
> No, I don't have such a list, but offhand:
>   domain creation (bootloader, etc.)
>   qmp
>   device attach
>   domain destruction
>   bootloader
>   vncviewer exec (or vncviewer parameter fetching - api should change)
> (these may overlap).

Thanks.

> > Is there a possibility that libxl might decide that the operation isn't
> > actually going to take all the long and do things synchronously,
> > returning normal success (e.g. 0)? Is that the reason for the separate
> > return code for this "I did what you asked me" case?
> 
> No, even if the operation completes quickly, the same exit path is
> taken.  So if you ask for a callback or an event, you get that even if
> the callback or event happened before your initiating function
> returns.  As I say:
> 
>  * Note that it is possible for an asynchronous operation which is to
>  * result in a callback to complete during its initiating function
>  * call.  In this case the initating function will return
>  * ERROR_ASYNC_INPROGRESS, even though by the time it returns the
>  * operation is complete and the callback has already happened.

Thanks, I think I simply hadn't got to that comment when I wrote the
question.

If this is a direct cut-n-paste then presumably there is a patch
somewhere where "initiating" is spelled "initating" as above. Hah, I've
just spotted where I pointed it out last time and you corrected it
below... At least I'm consistent.

> If ao_how is non-NULL then these functions cannot return 0.
> If it is NULL they cannot return ASYNC_INPROGRESS.
> 
> I chose to use a new exit status because it seemed safer but that's a
> matter of taste and if you prefer I could return 0 for that case.

I'm undecided (plus it seems a bit like bikeshedding). I certainly
prefer either 0 or {LIBXL_}ASYNC_IN_PROGRESS to ERROR_ASYNC_IN_PROGRESS
though.

> > Can we drop the ERROR_ prefix? I know that's inconsistent with the other
> > return codes but those actually are errors.
> 
> I guess we could but isn't this going to become a proper IDL enum at
> some point ?

At which point it would become LIBXL_ERROR_{FOOS} and
LIBXL_ASYNC_IN_PROGRESS?

[...]
> > > + * "disaster", except that libxl calls ao_how->callback instead of
> > > + * libxl_event_hooks.event_occurs.
> > > + *
> > > + * If ao_how->callback==NULL, a libxl_event will be generated which
> > > + * can be obtained from libxl_event_wait or libxl_event_check.
> > 
> > Or be delivered via event_occurs?
> 
> Yes, in principle, although that would be a silly thing to ask for.
> Why would you want your ao completions delivered via some central
> callback function just so that you could split them up again ?

True.

> > > + * call.  In this case the initating function will return
> > 
> >                               initiating
> 
> Fixed.
> 
> 
> > > +typedef struct {
> > > +    void (*callback)(libxl_ctx *ctx, int rc, void *for_callback);
> > > +    union {
> > > +        libxl_ev_user for_event; /* used if callback==NULL */
> > > +        void *for_callback; /* passed to callback */
> > 
> > Why void * for one bit of "closure" but an explicit uint64_t for the
> > other. I nearly commented on the use of uint64_t previously -- void *,
> > or perhaps (u)intptr_t is more normal.
> 
> The context value in an event needs to be marshallable to a foreign
> language or a foreign process.  So it can't be a pointer.  Or at
> least, it has to be a type which can contain integers as well as
> pointers, and an integer type is better for that.

Fair enough.

> 
> The context value for a callback function can be a void* because you
> don't ever need to "marshal" the "callback", or if you do you can wrap
> up the context appropriately.
> 
> > > + * Completion after initiator return (asynch. only):
> ...
> > > + *   * initiator calls ao_complete:               |
> > > + *     - observes event not net done,             |
> > > + *     - returns to caller                        |
> > > + *                                                |
> > > + *                              - ao completes on some thread
> > > + *                              - generate the event or call the callback
> > > + *                              - destroy the ao
> > 
> > Where does ao_inprogress fit into these diagrams?
> 
> There's a mistake in the diagrams: where it says on the left
> "initiator calls ao_complete" it should read "... ao_inprogress", and
> likewise for "ao_complete takes the lock".  I will fix this.

Thanks. While rereading with that substitution (and finding that it made
sense) I noticed:
+ *   * initiator calls ao_complete:               |
+ *     - observes event not net done,             |

You want s/net/yet/.

> > > +void libxl__ao_complete(libxl__egc *egc, libxl__ao *ao, int rc) {
> > > +    assert(ao->magic == LIBXL__AO_MAGIC);
> > > +    assert(!ao->complete);
> > > +    ao->complete = 1;
> > > +    ao->rc = rc;
> > > +
> > > +    if (ao->poller) {
> > > +        assert(ao->in_initiator);
> > > +        libxl__poller_wakeup(egc, ao->poller);
> > > +    } else if (ao->how.callback) {
> > > +        LIBXL_TAILQ_INSERT_TAIL(&egc->aos_for_callback, ao, 
> > > entry_for_callback);
> > > +    } else {
> > > +        libxl_event *ev;
> > > +        ev = NEW_EVENT(egc, OPERATION_COMPLETE, ao->domid);
> > > +        if (ev) {
> > > +            ev->for_user = ao->how.u.for_event;
> > > +            ev->u.operation_complete.rc = ao->rc;
> > > +            libxl__event_occurred(egc, ev);
> > > +        }
> > > +        ao->notified = 1;
> > > +    }
> > > +    if (!ao->in_initiator && ao->notified)
> > > +        libxl__ao__destroy(libxl__gc_owner(&egc->gc), ao);
> > 
> > You added a helper for this libxl__gc_owner(&egc..) construct.
> 
> You mean EGC_GC and CTX.  I don't think that's a good idea here
> because it obscures exactly what's going on.  In particular, there are
> two gcs here - the ao's and the egc's - and one of them may be about
> to evaporate.

OK.

> > > +            rc = eventloop_iteration(&egc,ao->poller);
> > > +            if (rc) {
> > > +                /* Oh dear, this is quite unfortunate. */
> > > +                LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Error waiting for"
> > > +                           " event during long-running operation 
> > > (rc=%d)", rc);
> > > +                sleep(1);
> > > +                /* It's either this or return ERROR_I_DONT_KNOW_WHETHER
> > > +                 * _THE_THING_YOU_ASKED_FOR_WILL_BE_DONE_LATER_WHEN
> > > +                 * _YOU_DIDNT_EXPECT_IT, since we don't have any kind of
> > > +                 * cancellation ability. */
> > 
> > Does this constitute a "disaster" (in the special hook sense)?
> 
> No, disaster just lets us say that some events may be lost and an ao
> completion might not be an event.  disaster doesn't let us randomly
> store up ongoing activity and have it happen when not expected.
> For example, a caller asking a synchronous operation does not expect
> to get an error code and then have the operation continue in the
> background anyway.

OK.

> > > + *   Usually callback function should use GET_CONTAINING_STRUCT
> > 
> > Now called CONTAINER_OF
> 
> Fixed.  Sometimes I wish the compiler could look into comments and
> spot when I've done this kind of thing.

Or read the comments and DWIM so we just need to write the comments ;-)

> > > + *   to obtain its own structure, containing a pointer to the ao,
> > > + *   and then use the gc from that ao.
> > > + */
> > > +
> > > +#define AO_CREATE(ctx, domid, ao_how)                           \
> > > +    libxl__ao *ao = libxl__ao_create(ctx, domid, ao_how);       \
> > > +    if (!ao) return ERROR_NOMEM;                                \
> > > +    AO_GC;                                                      \
> > > +    CTX_LOCK;
> > 
> > Where does the unlock which balances this come from? The only unlock I
> > see in this patch is the temporary drop in libxl__ao_inprogress which is
> > matched by another lock.
> 
> The AO_INPROGRESS macro is supposed to unlock it, but locks it again
> by mistake.  Well spotted.
> 
> > > +#define AO_INPROGRESS do{                                       \
> > > +        libxl_ctx *ao__ctx = libxl__gc_owner(&ao->gc);          \
> > > +        int ao__rc = libxl__ao_inprogress(ao);                  \
> > > +        libxl__ctx_lock(ao__ctx); /* gc is now invalid */       \
> > 
> > Is this supposed to be unlock answering my question above? Likewise in
> > ABORT?
> 
> Yes.  Indeed the comment above agrees:
> 
>  * - If initiation is successful, the initiating function needs
>  *   to run libxl__ao_inprogress right before unlocking and
>  *   returning, and return whatever it returns (AO_INPROGRESS macro).
>  *
>  * - If the initiation is unsuccessful, the initiating function must
>  *   call libxl__ao_abort before unlocking and returning whatever
>  *   error code is appropriate (AO_ABORT macro).

Right.

I think this highlights my concern about some code paths not being run
by the in-xl users...

>  > > +        return ao__rc;                                          \
> > > +   }while(0)
> > 
> > Can we arrange for AO_INPROGRESS and AO_ABORT to return the rc? So it
> > would become
> >         return AO_INPROGRESS;
> 
> That would be possible.  I wasn't sure whether to do it like that.
> Note that AO_CREATE already might return; doing it the way I have it
> now seems more symmetrical.
> 
> But perhaps it would make things clearer to have the return outside
> the macro.

I thought there was a general preference for that sort of thing, but I
suppose it depends on the required macro contortions to make it happen.


Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.