[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] libxl: fix stale timeout event callback race
Ian Jackson wrote: > Because there is not necessarily any lock held at the point the > application (eg, libvirt) calls libxl_osevent_occurred_timeout, in a > multithreaded program those calls may be arbitrarily delayed in > relation to other activities within the program. > > Specifically this means when ->timeout_deregister returns, libxl does > not know whether it can safely dispose of the for_libxl value or > whether it needs to retain it in case of an in-progress call to > _occurred_timeout. > > The interface could be fixed by requiring the application to make a > new call into libxl to say that the deregistration was complete. > > However that new call would have to be threaded through the > application's event loop; this is complicated and some application > authors are likely not to implement it properly. Furthermore the > easiest way to implement this facility in most event loops is to queue > up a time event for "now". > > Shortcut all of this by having libxl always call timeout_modify > setting abs={0,0} (ie, ASAP) instead of timeout_deregister. This will > cause the application to call _occurred_timeout. When processing this > calldown we see that we were no longer actually interested and simply > throw it away. > > Additionally, there is a race between _occurred_timeout and > ->timeout_modify. If libxl ever adjusts the deadline for a timeout > the application may already be in the process of calling _occurred, in > which case the situation with for_app's lifetime becomes very > complicated. Therefore abolish libxl__ev_time_modify_{abs,rel} (which > have no callers) and promise to the application only ever to call > ->timeout_modify with abs=={0,0}. The application still needs to cope > with ->timeout_modify racing with its internal function which calls > _occurred_timeout. Document this. > > This is a forwards-compatible change for applications using the libxl > API, and will hopefully eliminate these races in callback-supplying > applications (such as libvirt) without the need for corresponding > changes to the application. > I'm not sure how to avoid changing the application (libvirt). Currently, the libvirt libxl driver removes the timeout from the libvirt event loop in the timeout_deregister() function. This function is never called now and hence the timeout is never removed. I had to make the following change in the libxl driver to use your patches diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 302f81c..d8f078e 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -225,16 +225,11 @@ static int libxlTimeoutRegisterEventHook(void *priv, static int libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, - struct timeval abs_t) + struct timeval abs_t ATTRIBUTE_UNUSED) { - struct timeval now; - int timeout; struct libxlOSEventHookTimerInfo *timer_info = *hndp; - gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; - virEventUpdateTimeout(timer_info->id, timeout); + virEventRemoveTimeout(timer_info->id); return 0; } I could also call virEventUpdateTimeout() with a timeout of 0 to make it fire on the next iteration of the event loop, and then remove the timeout in the callback before invoking libxl_osevent_occurred_timeout(). But either way, changes need to be made to the application. > For clarity, fold the body of time_register_finite into its one > remaining call site. This makes the semantics of ev->infinite > slightly clearer. > > Cc: Bamvor Jian Zhang <bjzhang@xxxxxxxx> > Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxx> > Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > > -- > v3: > - Fix null pointer dereference in case when hooks not supplied. > --- > tools/libxl/libxl_event.c | 89 > +++++++-------------------------------------- > tools/libxl/libxl_event.h | 17 ++++++++- > 2 files changed, 29 insertions(+), 77 deletions(-) > > diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c > index f1fe425..f86c528 100644 > --- a/tools/libxl/libxl_event.c > +++ b/tools/libxl/libxl_event.c > @@ -267,18 +267,11 @@ static int time_rel_to_abs(libxl__gc *gc, int ms, > struct timeval *abs_out) > return 0; > } > > -static void time_insert_finite(libxl__gc *gc, libxl__ev_time *ev) > -{ > - libxl__ev_time *evsearch; > - LIBXL_TAILQ_INSERT_SORTED(&CTX->etimes, entry, ev, evsearch, /*empty*/, > - timercmp(&ev->abs, &evsearch->abs, >)); > - ev->infinite = 0; > -} > - > static int time_register_finite(libxl__gc *gc, libxl__ev_time *ev, > struct timeval absolute) > { > int rc; > + libxl__ev_time *evsearch; > > rc = OSEVENT_HOOK(timeout,register, alloc, &ev->nexus->for_app_reg, > absolute, ev->nexus); > @@ -286,7 +279,8 @@ static int time_register_finite(libxl__gc *gc, > libxl__ev_time *ev, > > ev->infinite = 0; > ev->abs = absolute; > - time_insert_finite(gc, ev); > + LIBXL_TAILQ_INSERT_SORTED(&CTX->etimes, entry, ev, evsearch, /*empty*/, > + timercmp(&ev->abs, &evsearch->abs, >)); > > return 0; > } > @@ -294,7 +288,12 @@ static int time_register_finite(libxl__gc *gc, > libxl__ev_time *ev, > static void time_deregister(libxl__gc *gc, libxl__ev_time *ev) > { > if (!ev->infinite) { > - OSEVENT_HOOK_VOID(timeout,deregister, release, > ev->nexus->for_app_reg); > + struct timeval right_away = { 0, 0 }; > + if (ev->nexus) /* only set if app provided hooks */ > + ev->nexus->ev = 0; > + OSEVENT_HOOK_VOID(timeout,modify, > Nit again about the space here. > + noop /* release nexus in _occurred_ */, > + ev->nexus->for_app_reg, right_away); > I had to change this to &ev->nexus->for_app_reg to get it to work properly. But with this change, and the aforementioned hack to the libvirt libxl driver, I'm not seeing the race any longer. Regards, Jim > LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry); > } > } > @@ -364,70 +363,6 @@ int libxl__ev_time_register_rel(libxl__gc *gc, > libxl__ev_time *ev, > return rc; > } > > -int libxl__ev_time_modify_abs(libxl__gc *gc, libxl__ev_time *ev, > - struct timeval absolute) > -{ > - int rc; > - > - CTX_LOCK; > - > - DBG("ev_time=%p modify abs==%lu.%06lu", > - ev, (unsigned long)absolute.tv_sec, (unsigned long)absolute.tv_usec); > - > - assert(libxl__ev_time_isregistered(ev)); > - > - if (ev->infinite) { > - rc = time_register_finite(gc, ev, absolute); > - if (rc) goto out; > - } else { > - rc = OSEVENT_HOOK(timeout,modify, noop, > - &ev->nexus->for_app_reg, absolute); > - if (rc) goto out; > - > - LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry); > - ev->abs = absolute; > - time_insert_finite(gc, ev); > - } > - > - rc = 0; > - out: > - time_done_debug(gc,__func__,ev,rc); > - CTX_UNLOCK; > - return rc; > -} > - > -int libxl__ev_time_modify_rel(libxl__gc *gc, libxl__ev_time *ev, > - int milliseconds) > -{ > - struct timeval absolute; > - int rc; > - > - CTX_LOCK; > - > - DBG("ev_time=%p modify ms=%d", ev, milliseconds); > - > - assert(libxl__ev_time_isregistered(ev)); > - > - if (milliseconds < 0) { > - time_deregister(gc, ev); > - ev->infinite = 1; > - rc = 0; > - goto out; > - } > - > - rc = time_rel_to_abs(gc, milliseconds, &absolute); > - if (rc) goto out; > - > - rc = libxl__ev_time_modify_abs(gc, ev, absolute); > - if (rc) goto out; > - > - rc = 0; > - out: > - time_done_debug(gc,__func__,ev,rc); > - CTX_UNLOCK; > - return rc; > -} > - > void libxl__ev_time_deregister(libxl__gc *gc, libxl__ev_time *ev) > { > CTX_LOCK; > @@ -1161,7 +1096,11 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, > void *for_libxl) > CTX_LOCK; > assert(!CTX->osevent_in_hook); > > - libxl__ev_time *ev = osevent_ev_from_hook_nexus(ctx, for_libxl); > + libxl__osevent_hook_nexus *nexus = for_libxl; > + libxl__ev_time *ev = osevent_ev_from_hook_nexus(ctx, nexus); > + > + osevent_release_nexus(gc, &CTX->hook_timeout_nexi_idle, nexus); > + > if (!ev) goto out; > assert(!ev->infinite); > > diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h > index 3bcb6d3..51f2721 100644 > --- a/tools/libxl/libxl_event.h > +++ b/tools/libxl/libxl_event.h > @@ -287,8 +287,10 @@ typedef struct libxl_osevent_hooks { > int (*timeout_register)(void *user, void **for_app_registration_out, > struct timeval abs, void *for_libxl); > int (*timeout_modify)(void *user, void **for_app_registration_update, > - struct timeval abs); > - void (*timeout_deregister)(void *user, void *for_app_registration); > + struct timeval abs) > + /* only ever called with abs={0,0}, meaning ASAP */; > + void (*timeout_deregister)(void *user, void *for_app_registration) > + /* will never be called */; > } libxl_osevent_hooks; > > /* The application which calls register_fd_hooks promises to > @@ -337,6 +339,17 @@ typedef struct libxl_osevent_hooks { > * register (or modify), and pass it to subsequent calls to modify > * or deregister. > * > + * Note that the application must cope with a call from libxl to > + * timeout_modify racing with its own call to > + * libxl__osevent_occurred_timeout. libxl guarantees that > + * timeout_modify will only be called with abs={0,0} but the > + * application must still ensure that libxl's attempt to cause the > + * timeout to occur immediately is safely ignored even the timeout is > + * actually already in the process of occurring. > + * > + * timeout_deregister is not used because it forms part of a > + * deprecated unsafe mode of use of the API. > + * > * osevent_register_hooks may be called only once for each libxl_ctx. > * libxl may make calls to register/modify/deregister from within > * any libxl function (indeed, it will usually call register from > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |