[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 19/24] libxl: provide progress reporting for long-running operations



Ian Campbell writes ("Re: [Xen-devel] [PATCH 19/24] libxl: provide progress 
reporting for long-running operations"):
> On Mon, 2012-04-16 at 18:18 +0100, Ian Jackson wrote:
> > This will be used for reporting, during domain creation, that the
> > console is ready.
> 
> I think of progress as being 1%, 2%, 3%...10% etc type stuff. I guess
> progress in the sense you mean here could be thought of as "milestones"?

It might be used for either of those.

> > @@ -899,12 +899,25 @@ static void egc_run_callbacks(libxl__egc *egc)
> >  {
...
> >      LIBXL_TAILQ_FOREACH_SAFE(ev, &egc->occurred_for_callback, link, 
> > ev_tmp) {
> >          LIBXL_TAILQ_REMOVE(&egc->occurred_for_callback, ev, link);
> >          CTX->event_hooks->event_occurs(CTX->event_hooks_user, ev);
> >      }
> 
> Are the BSD FOREACH_SAFE functions safe against manipulation from other
> threads?

No.  That would mean they'd have to be some kind of RCU thing, which
they are not.

> I only ask because Stefano was surprised that this wasn't the case for
> the Linux macros, in that context "SAFE" just means you can delete the
> element in this thread (like you do here) but that if another thread is
> also manipulating the same list.
...
> By my reading these macros suffer from the same and therefore a bunch
> more locking is needed. I hope I'm mistaken...

The lists in the egc are per-thread data structures.  So that is
safe.  I started writing a reply which contained some proposed
comments for clarifying this situation, and discovered an actual bug -
the ao may be used reentrantly in a way that's not safe.

Please see the patch below, which I hope will clarify things.  This
will go into my series before this progress reporting patch.  I will
update the progress reporting patch with improved comments, but it's
very similar so I think the intent should now be clear.

> > +     * decrememt progress_reports_outstanding, and call us again.
> 
>           decrement

Fixed.

> > +libxl__ao *libxl__ao_create(libxl_ctx *ctx, uint32_t domid,
> > +                            const libxl_asyncop_how *how)
> 
> AFAICT this function has just moved and not changed, right?

Yes, this is pointless code motion.  Well actually it doesn't move but
git-diff didn't manage to find it.  I have shuffled things around to
make these hunks go away.

> [...]
> 
> My only concern is about the locking of the list walks -- but if that is
> an issue it will also be there in existing code so I think it can all be
> fixed in one go later. Hence:
> 
> Acked-by: Ian Campbell <ian.campbell@xxxxxxxxxx>

You failed to spot that libxl__ao_progress_report doesn't ever add the
generated aop to the aops_for_callback list :-).  In general the
callback-style interaction with libxl is very lightly tested.  I
expect the appearance of the first user (libvirt, probably) will mean
we have to do some debugging.

Anyway, I'll not add your ack to this just yet.  I'd rather wait until
I've seen what you say about the comments below, and perhaps about the
revised version.

I intend to repost the whole series today.

Ian.


From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Subject: [PATCH] libxl: Fix an ao completion bug; document locking policy

Document the concurrent access policies for libxl__ao and libxl__egc,
and their corresponding gcs.

Fix a violation of the policy:

If an ao was submitted and a callback requested, and while the
initiating function was still running on the original thread, the ao
is completed on another thread, the completing thread would improperly
concurrently access the ao with the initiating thread.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

---
 tools/libxl/libxl_event.c    |    7 +++++++
 tools/libxl/libxl_internal.h |   23 ++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 2f559d5..7e8d002 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -897,6 +897,11 @@ void libxl__event_disaster(libxl__egc *egc, const char 
*msg, int errnoval,
 
 static void egc_run_callbacks(libxl__egc *egc)
 {
+    /*
+     * The callbacks must happen with the ctx unlocked.
+     * See the comment near #define EGC_GC in libxl_internal.h and
+     * those in the definitions of libxl__egc and libxl__ao.
+     */
     EGC_GC;
     libxl_event *ev, *ev_tmp;
 
@@ -910,9 +915,11 @@ static void egc_run_callbacks(libxl__egc *egc)
                              entry_for_callback, ao_tmp) {
         LIBXL_TAILQ_REMOVE(&egc->aos_for_callback, ao, entry_for_callback);
         ao->how.callback(CTX, ao->rc, ao->how.u.for_callback);
+        CTX_LOCK;
         ao->notified = 1;
         if (!ao->in_initiator)
             libxl__ao__destroy(CTX, ao);
+        CTX_UNLOCK;
     }
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 72e3fd9..b3657cd 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -368,7 +368,8 @@ struct libxl__gc {
 };
 
 struct libxl__egc {
-    /* for event-generating functions only */
+    /* For event-generating functions only.
+     * The egc and its gc may be accessed only on the creating thread. */
     struct libxl__gc gc;
     struct libxl__event_list occurred_for_callback;
     LIBXL_TAILQ_HEAD(, libxl__ao) aos_for_callback;
@@ -378,6 +379,20 @@ struct libxl__egc {
 #define LIBXL__AO_MAGIC_DESTROYED    0xA0DEAD00ul
 
 struct libxl__ao {
+    /*
+     * An ao and its gc may be accessed only with the ctx lock held.
+     *
+     * Special exception: If an ao has been added to
+     * egc->aos_for_callback, the thread owning the egc may remove the
+     * ao from that list and make the callback without holding the
+     * lock.
+     *
+     * Corresponding restriction: An ao may be added only to one
+     * egc->aos_for_callback, once; rc and how must already have been
+     * set and may not be subsequently modified.  (This restriction is
+     * easily and obviously met since the ao is queued for callback
+     * only in libxl__ao_complete.)
+     */
     uint32_t magic;
     unsigned constructing:1, in_initiator:1, complete:1, notified:1;
     int rc;
@@ -1352,6 +1367,12 @@ libxl__device_model_version_running(libxl__gc *gc, 
uint32_t domid);
  * should in any case not find it necessary to call egc-creators from
  * within libxl.
  *
+ * The callbacks must all take place with the ctx unlocked because
+ * the application is entitled to reenter libxl from them.  This
+ * would be bad not because the lock is not recursive (it is) but
+ * because the application might make blocking libxl calls which
+ * would hold the lock unreasonably long.
+ *
  * For the same reason libxl__egc_cleanup (or EGC_FREE) must be called
  * with the ctx *unlocked*.  So the right pattern has the EGC_...
  * macro calls on the outside of the CTX_... ones.
-- 
tg: (d9d64ad..) t/xen/xl.event.ao-threading-lock-fix (depends on: 
t/xen/xl.event.domain-create)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.