[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] libxl: More probably detect reentry of destroyed ctx



On Mon, Feb 09, 2015 at 03:51:12PM +0000, Ian Jackson wrote:
> In libxl_ctx_free:
> 
> 1. Move the GC_INIT earlier, so that we can:
> 
> 2. Take the lock around most of the work.  This is technically
>    unnecessary since calling any other libxl entrypoint on a ctx being
>    passed to libxl_ctx_free risks undefined behaviour.  But, taking
>    the lock allows us to much more usually spot this.
> 
> 3. Increment osevent_in_hook by 1000.  If we are reentered after
>    destroy, this will trip some of our entrypoints' asserts.  It also
>    means that if we crash for some other reason relating to reentering
>    a destroyed ctx, the cause will be more obviously visible by
>    examining ctx->osevent_in_hook (assuming that the memory previously
>    used for the ctx hasn't been reused and overwritten).
> 
> 4. Free the lock again.  (pthread_mutex_destroy requires that the
>    mutex be unlocked.)
> 
> With this patch, I find that an occasional race previously reported
> as:
>   libvirtd: libxl_internal.h:3265: libxl__ctx_unlock: Assertion `!r' failed.
> is now reported as:
>   libvirtd: libxl_event.c:1236: libxl_osevent_occurred_fd: Assertion 
> `!libxl__gc_owner(gc)->osevent_in_hook' failed.
> 
> Examining the call trace with gdb shows this:
>   (gdb) bt
>   #0  0xb773f424 in __kernel_vsyscall ()
>   #1  0xb7022871 in raise () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
>   #2  0xb7025d62 in abort () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
>   #3  0xb701b958 in __assert_fail () from 
> /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
>   #4  0xb6f00390 in libxl_osevent_occurred_fd (ctx=0xb84813a8, 
> for_libxl=0xb84d6848, fd=31, events_ign=0, revents_ign=1) at 
> libxl_event.c:1236
>   #5  0xb1b70464 in libxlDomainObjFDEventCallback () from 
> /usr/local/lib/libvirt/connection-driver/libvirt_driver_libxl.so
>   #6  0xb72163b1 in virEventPollDispatchHandles () from 
> /usr/local/lib/libvirt.so.0
>   #7  0xb7216be5 in virEventPollRunOnce () from /usr/local/lib/libvirt.so.0
>   #8  0xb7214a7e in virEventRunDefaultImpl () from /usr/local/lib/libvirt.so.0
>   #9  0xb77c7b98 in virNetServerRun ()
>   #10 0xb7771c63 in main ()
>   (gdb) print ctx->osevent_in_hook
>   $2 = 1000
>   (gdb)
> which IMO demonstrates that libxl_osevent_occurred_fd is being called
> on a destroyed ctx.
> 
> This is probably a symptom of the bug in libvirt fixed by these
> patches:
>    https://www.redhat.com/archives/libvir-list/2015-February/msg00024.html
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> CC: Ian Campbell <ian.campbell@xxxxxxxxxx>
> CC: Wei Liu <wei.liu2@xxxxxxxxxx>
> CC: Jim Fehlig <jfehlig@xxxxxxxx>

Acked-by: Wei Liu <wei.liu2@xxxxxxxxxx>

> ---
>  tools/libxl/libxl.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index cd6f42c..de75ac4 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -146,11 +146,13 @@ int libxl_ctx_free(libxl_ctx *ctx)
>  {
>      if (!ctx) return 0;
>  
> -    assert(!ctx->osevent_in_hook);
> -
>      int i;
>      GC_INIT(ctx);
>  
> +    CTX_LOCK;
> +    assert(!ctx->osevent_in_hook);
> +    CTX->osevent_in_hook += 1000;
> +

It would be good if you add comment here to say this is used to help
debugging.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.