[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/9] libxl: Asynchronous/long-running operation infrastructure



Thanks for the thorough review.

Ian Campbell writes ("Re: [Xen-devel] [PATCH 6/9] libxl: 
Asynchronous/long-running operation infrastructure"):
> You've done device removal, do you have a list of other things which
> should use this? (perhaps with an associated list of people's names...)

No, I don't have such a list, but offhand:
  domain creation (bootloader, etc.)
  qmp
  device attach
  domain destruction
  bootloader
  vncviewer exec (or vncviewer parameter fetching - api should change)
(these may overlap).


> > + * If ao_how==NULL, the function will be synchronous.
> > + *
> > + * If ao_how!=NULL, the function will set the operation going, and
> > + * if this is successful will return ERROR_ASYNCH_INPROGRESS.
> 
> There's an extra H here compared with the actual symbol name (I think
> the symbol is right).

Fixed everywhere.


> Is there a possibility that libxl might decide that the operation isn't
> actually going to take all the long and do things synchronously,
> returning normal success (e.g. 0)? Is that the reason for the separate
> return code for this "I did what you asked me" case?

No, even if the operation completes quickly, the same exit path is
taken.  So if you ask for a callback or an event, you get that even if
the callback or event happened before your initiating function
returns.  As I say:

 * Note that it is possible for an asynchronous operation which is to
 * result in a callback to complete during its initiating function
 * call.  In this case the initating function will return
 * ERROR_ASYNC_INPROGRESS, even though by the time it returns the
 * operation is complete and the callback has already happened.

If ao_how is non-NULL then these functions cannot return 0.
If it is NULL they cannot return ASYNC_INPROGRESS.

I chose to use a new exit status because it seemed safer but that's a
matter of taste and if you prefer I could return 0 for that case.


> Can we drop the ERROR_ prefix? I know that's inconsistent with the other
> return codes but those actually are errors.

I guess we could but isn't this going to become a proper IDL enum at
some point ?


> > + *
> > + * If ao_how->callback!=NULL, the callback will be called when the
> > + * operation completes.  The same rules as for libxl_event_hooks
> > + * apply, including the reentrancy rules and the possibility of
> 
>            ^ (see above/below) -- depending on how these comments end up
> relative to each other.

It's actually in a different file.  I'll add a cross-reference.


> > + * "disaster", except that libxl calls ao_how->callback instead of
> > + * libxl_event_hooks.event_occurs.
> > + *
> > + * If ao_how->callback==NULL, a libxl_event will be generated which
> > + * can be obtained from libxl_event_wait or libxl_event_check.
> 
> Or be delivered via event_occurs?

Yes, in principle, although that would be a silly thing to ask for.
Why would you want your ao completions delivered via some central
callback function just so that you could split them up again ?



> > + * call.  In this case the initating function will return
> 
>                               initiating

Fixed.


> > +typedef struct {
> > +    void (*callback)(libxl_ctx *ctx, int rc, void *for_callback);
> > +    union {
> > +        libxl_ev_user for_event; /* used if callback==NULL */
> > +        void *for_callback; /* passed to callback */
> 
> Why void * for one bit of "closure" but an explicit uint64_t for the
> other. I nearly commented on the use of uint64_t previously -- void *,
> or perhaps (u)intptr_t is more normal.

The context value in an event needs to be marshallable to a foreign
language or a foreign process.  So it can't be a pointer.  Or at
least, it has to be a type which can contain integers as well as
pointers, and an integer type is better for that.

The context value for a callback function can be a void* because you
don't ever need to "marshal" the "callback", or if you do you can wrap
up the context appropriately.

> > + * Completion after initiator return (asynch. only):
...
> > + *   * initiator calls ao_complete:               |
> > + *     - observes event not net done,             |
> > + *     - returns to caller                        |
> > + *                                                |
> > + *                              - ao completes on some thread
> > + *                              - generate the event or call the callback
> > + *                              - destroy the ao
> 
> Where does ao_inprogress fit into these diagrams?

There's a mistake in the diagrams: where it says on the left
"initiator calls ao_complete" it should read "... ao_inprogress", and
likewise for "ao_complete takes the lock".  I will fix this.


> > + */
> > +
> > +void libxl__ao__destroy(libxl_ctx *ctx, libxl__ao *ao) {
> 
> CODING_STYLE wants these braces on the next line (a bunch more follow)

Oh yes, will fix.


> > +void libxl__ao_complete(libxl__egc *egc, libxl__ao *ao, int rc) {
> > +    assert(ao->magic == LIBXL__AO_MAGIC);
> > +    assert(!ao->complete);
> > +    ao->complete = 1;
> > +    ao->rc = rc;
> > +
> > +    if (ao->poller) {
> > +        assert(ao->in_initiator);
> > +        libxl__poller_wakeup(egc, ao->poller);
> > +    } else if (ao->how.callback) {
> > +        LIBXL_TAILQ_INSERT_TAIL(&egc->aos_for_callback, ao, 
> > entry_for_callback);
> > +    } else {
> > +        libxl_event *ev;
> > +        ev = NEW_EVENT(egc, OPERATION_COMPLETE, ao->domid);
> > +        if (ev) {
> > +            ev->for_user = ao->how.u.for_event;
> > +            ev->u.operation_complete.rc = ao->rc;
> > +            libxl__event_occurred(egc, ev);
> > +        }
> > +        ao->notified = 1;
> > +    }
> > +    if (!ao->in_initiator && ao->notified)
> > +        libxl__ao__destroy(libxl__gc_owner(&egc->gc), ao);
> 
> You added a helper for this libxl__gc_owner(&egc..) construct.

You mean EGC_GC and CTX.  I don't think that's a good idea here
because it obscures exactly what's going on.  In particular, there are
two gcs here - the ao's and the egc's - and one of them may be about
to evaporate.


> > +    ao = calloc(sizeof(*ao),1);
> 
> calloc is actually (nmemb, size). I'm sure it doesn't really matter
> though.

I'll fix it though.


> > +            rc = eventloop_iteration(&egc,ao->poller);
> > +            if (rc) {
> > +                /* Oh dear, this is quite unfortunate. */
> > +                LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Error waiting for"
> > +                           " event during long-running operation (rc=%d)", 
> > rc);
> > +                sleep(1);
> > +                /* It's either this or return ERROR_I_DONT_KNOW_WHETHER
> > +                 * _THE_THING_YOU_ASKED_FOR_WILL_BE_DONE_LATER_WHEN
> > +                 * _YOU_DIDNT_EXPECT_IT, since we don't have any kind of
> > +                 * cancellation ability. */
> 
> Does this constitute a "disaster" (in the special hook sense)?

No, disaster just lets us say that some events may be lost and an ao
completion might not be an event.  disaster doesn't let us randomly
store up ongoing activity and have it happen when not expected.
For example, a caller asking a synchronous operation does not expect
to get an error code and then have the operation continue in the
background anyway.


> > + *   Usually callback function should use GET_CONTAINING_STRUCT
> 
> Now called CONTAINER_OF

Fixed.  Sometimes I wish the compiler could look into comments and
spot when I've done this kind of thing.


> > + *   to obtain its own structure, containing a pointer to the ao,
> > + *   and then use the gc from that ao.
> > + */
> > +
> > +#define AO_CREATE(ctx, domid, ao_how)                           \
> > +    libxl__ao *ao = libxl__ao_create(ctx, domid, ao_how);       \
> > +    if (!ao) return ERROR_NOMEM;                                \
> > +    AO_GC;                                                      \
> > +    CTX_LOCK;
> 
> Where does the unlock which balances this come from? The only unlock I
> see in this patch is the temporary drop in libxl__ao_inprogress which is
> matched by another lock.

The AO_INPROGRESS macro is supposed to unlock it, but locks it again
by mistake.  Well spotted.

> > +#define AO_INPROGRESS do{                                       \
> > +        libxl_ctx *ao__ctx = libxl__gc_owner(&ao->gc);          \
> > +        int ao__rc = libxl__ao_inprogress(ao);                  \
> > +        libxl__ctx_lock(ao__ctx); /* gc is now invalid */       \
> 
> Is this supposed to be unlock answering my question above? Likewise in
> ABORT?

Yes.  Indeed the comment above agrees:

 * - If initiation is successful, the initiating function needs
 *   to run libxl__ao_inprogress right before unlocking and
 *   returning, and return whatever it returns (AO_INPROGRESS macro).
 *
 * - If the initiation is unsuccessful, the initiating function must
 *   call libxl__ao_abort before unlocking and returning whatever
 *   error code is appropriate (AO_ABORT macro).


 > > +        return ao__rc;                                          \
> > +   }while(0)
> 
> Can we arrange for AO_INPROGRESS and AO_ABORT to return the rc? So it
> would become
>         return AO_INPROGRESS;

That would be possible.  I wasn't sure whether to do it like that.
Note that AO_CREATE already might return; doing it the way I have it
now seems more symmetrical.

But perhaps it would make things clearer to have the return outside
the macro.

> Is the ({stuff,stuff,stuff,val}) syntax a gcc-ism?

Yes.  But I don't think that should stop us if we prefer it.


> > +/* All of these MUST be called with the ctx locked.
> 
> Except libxl__ao_create? at least according to the implementation of
> AO_CREATE which takes the lock after.

libxl_ao__create calls libxl__poller_get which needs to be called with
the lock held.  Well spotted.


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.