[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 4/4] libxl: More probably detect reentry of destroyed ctx



In libxl_ctx_free:

1. Move the GC_INIT earlier, so that we can:

2. Take the lock around most of the work.  This is technically
   unnecessary since calling any other libxl entrypoint on a ctx being
   passed to libxl_ctx_free risks undefined behaviour.  But, taking
   the lock allows us to much more usually spot this.

3. Increment osevent_in_hook by 1000.  If we are reentered after
   destroy, this will trip some of our entrypoints' asserts.  It also
   means that if we crash for some other reason relating to reentering
   a destroyed ctx, the cause will be more obviously visible by
   examining ctx->osevent_in_hook (assuming that the memory previously
   used for the ctx hasn't been reused and overwritten).

4. Free the lock again.  (pthread_mutex_destroy requires that the
   mutex be unlocked.)

With this patch, I find that an occasional race previously reported
as:
  libvirtd: libxl_internal.h:3265: libxl__ctx_unlock: Assertion `!r' failed.
is now reported as:
  libvirtd: libxl_event.c:1236: libxl_osevent_occurred_fd: Assertion 
`!libxl__gc_owner(gc)->osevent_in_hook' failed.

Examining the call trace with gdb shows this:
  (gdb) bt
  #0  0xb773f424 in __kernel_vsyscall ()
  #1  0xb7022871 in raise () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
  #2  0xb7025d62 in abort () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
  #3  0xb701b958 in __assert_fail () from 
/lib/i386-linux-gnu/i686/nosegneg/libc.so.6
  #4  0xb6f00390 in libxl_osevent_occurred_fd (ctx=0xb84813a8, 
for_libxl=0xb84d6848, fd=31, events_ign=0, revents_ign=1) at libxl_event.c:1236
  #5  0xb1b70464 in libxlDomainObjFDEventCallback () from 
/usr/local/lib/libvirt/connection-driver/libvirt_driver_libxl.so
  #6  0xb72163b1 in virEventPollDispatchHandles () from 
/usr/local/lib/libvirt.so.0
  #7  0xb7216be5 in virEventPollRunOnce () from /usr/local/lib/libvirt.so.0
  #8  0xb7214a7e in virEventRunDefaultImpl () from /usr/local/lib/libvirt.so.0
  #9  0xb77c7b98 in virNetServerRun ()
  #10 0xb7771c63 in main ()
  (gdb) print ctx->osevent_in_hook
  $2 = 1000
  (gdb)
which IMO demonstrates that libxl_osevent_occurred_fd is being called
on a destroyed ctx.

This is probably a symptom of the bug in libvirt fixed by these
patches:
   https://www.redhat.com/archives/libvir-list/2015-February/msg00024.html

Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>
CC: Jim Fehlig <jfehlig@xxxxxxxx>
---
 tools/libxl/libxl.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index cd6f42c..de75ac4 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -146,11 +146,13 @@ int libxl_ctx_free(libxl_ctx *ctx)
 {
     if (!ctx) return 0;
 
-    assert(!ctx->osevent_in_hook);
-
     int i;
     GC_INIT(ctx);
 
+    CTX_LOCK;
+    assert(!ctx->osevent_in_hook);
+    CTX->osevent_in_hook += 1000;
+
     /* Deregister all libxl__ev_KINDs: */
 
     free_disable_deaths(gc, &CTX->death_list);
@@ -196,6 +198,7 @@ int libxl_ctx_free(libxl_ctx *ctx)
     libxl__sigchld_notneeded(gc);
     libxl__pipe_close(ctx->sigchld_selfpipe);
 
+    CTX_UNLOCK;
     pthread_mutex_destroy(&ctx->lock);
 
     GC_FREE;
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.