[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [libvirt] [PATCH 00/10] libxl: switch driver to use a single libxl_ctx



Michal Privoznik wrote:
> On 18.02.2015 04:56, Jim Fehlig wrote:
>   
>> This series is a follow up to
>>
>> https://www.redhat.com/archives/libvir-list/2015-February/msg00024.html
>>
>> It goes a step further and changes the libxl driver to use one,
>> driver-wide libxl_ctx.  Currently the libxl driver has one driver-wide
>> ctx for operations that are not domain-specific and a ctx for each
>> domain.  This approach was necessary back in the old Xen4.1 libxl days,
>> but with the newer libxl it is more of a hinderance than benefit.
>> Ian Jackson suggested moving to a single ctx while discussing some
>> deadlocks and assertions encountered in the libxl driver when under
>> load from tests such as OpenStack Tempest.
>>
>> Making such a change involves quite a bit of code movement.  I've tried
>> to split that up into a reviewable series,  the result of which are the
>> 9 patches that follow.  I've ran this through all of my automated tests
>> as well as some hacky tests I created to reproduce failures revealed by
>> Tempest.
>>
>> One downside of moving to a single ctx is losing the per-domain log
>> files.  Currently, a single log stream can be associated with ctx, hence
>> all logging from libxl will go to a single file.  Ian is going to
>> investigate possibilities to accommodate per-domain log files in libxl,
>> but in the meantime folks using Xen are accustomed to a single
>> log file from the xend days.
>>
>> I've been testing this series on xen-unstable and Xen 4.4.1 + commits
>> 2ffeb5d7, 4b9143e4, 5a968257, 60ce518a, 66bff9fd, 77a1bf37, f49f9b41,
>> 6b5a5bba, 93699882d, f1335f0d, and 8bc64413.  Results are much better
>> than before applying the series, but I do notice a stuck hypercall
>> after many (hundreds) concurrent domain create/destroy operations.
>> The single libxl_ctx is locked in the callpath, essentially deadlocking
>> the driver.
>>
>> Thread 1 (Thread 0x7f0649a198c0 (LWP 2235)):
>>  0  0x00007f0645272397 in ioctl () from /lib64/libc.so.6
>>  1  0x00007f0645d8e353 in linux_privcmd_hypercall (xch=<optimized out>,
>>     h=<optimized out>, hypercall=<optimized out>) at xc_linux_osdep.c:134
>>  2  0x00007f0645d854b8 in do_xen_hypercall (xch=xch@entry=0x7f0630039390, 
>>     hypercall=hypercall@entry=0x7fffd53f80e0) at xc_private.c:249
>>  3  0x00007f0645d86aa4 in do_sysctl (sysctl=sysctl@entry=0x7fffd53f8080, 
>>     xch=xch@entry=0x7f0630039390) at xc_private.h:281
>>  4  xc_sysctl (xch=xch@entry=0x7f0630039390,
>>     sysctl=sysctl@entry=0x7fffd53f8170) at xc_private.c:656
>>  5  0x00007f0645d7bfbf in xc_domain_getinfolist (xch=0x7f0630039390, 
>>     first_domain=first_domain@entry=119, max_domains=max_domains@entry=1, 
>>     info=info@entry=0x7fffd53f8260) at xc_domain.c:382
>>  6  0x00007f0645fabca6 in domain_death_xswatch_callback
>>     (egc=0x7fffd53f83f0, w=<optimized out>, wpath=<optimized out>,
>>     epath=<optimized out>) at libxl.c:1041
>>  7  0x00007f0645fd75a8 in watchfd_callback (egc=0x7fffd53f83f0,
>>     ev=<optimized out>, fd=<optimized out>, events=<optimized out>,
>>     revents=<optimized out>) at libxl_event.c:515
>>  8  0x00007f0645fd8ac3 in libxl_osevent_occurred_fd (ctx=<optimized out>, 
>>     for_libxl=<optimized out>, fd=<optimized out>,
>>     events_ign=<optimized out>, revents_ign=<optimized out>) at
>>     libxl_event.c:1259
>>  9  0x00007f063a23402c in libxlFDEventCallback (watch=454, fd=33,
>>     vir_events=1, fd_info=0x7f0608007e70) at libxl/libxl_driver.c:123
>>
>> There is no hint in any logs or dmesg suggesting a cause for the stuck
>> hypercall.  Any suggestions for further debugging tips appreciated.
>>     

FYI, this was not a hung hypercall, but looping clear back in frame 6
that I overlooked.  It was fixed in libxl by the following commit

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4783c99aab866f470bd59368cfbf5ad5f677b0ec

>> Jim Fehlig (10):
>>   libxl: remove redundant calls to libxl_evdisable_domain_death
>>   libxl: use libxl_ctx passed to libxlConsoleCallback
>>   libxl: use driver-wide ctx in fd and timer event handling
>>   libxl: Move setup of child processing code to driver initialization
>>   libxl: move event registration to driver initialization
>>   libxl: use global libxl_ctx in event handler
>>   libxl: remove unnecessary libxlDomainEventsRegister
>>   libxl: make libxlDomainFreeMem static
>>   libxl: remove per-domain libxl_ctx
>>   libxl: change libxl log stream to ERROR log level
>>
>>  src/libxl/libxl_conf.c      |   2 +-
>>  src/libxl/libxl_domain.c    | 438 ++++++---------------------------------
>>  src/libxl/libxl_domain.h    |  27 +--
>>  src/libxl/libxl_driver.c    | 484 
>> +++++++++++++++++++++++++++++++-------------
>>  src/libxl/libxl_migration.c |  17 +-
>>  5 files changed, 426 insertions(+), 542 deletions(-)
>>
>>     
>
> ACK series
>   

Thanks!  1 and 2 were pushed earlier as part of this trivial series

https://www.redhat.com/archives/libvir-list/2015-March/msg00102.html

I've now pushed 3-9, but held off on pushing 10 since it removes the
possibility to get debug level messages from libxl.  I think a better
approach would be to introduce /etc/libvirt/libxl.conf with a
'log_level' setting, giving users the ability to change this a bit more
dynamically.  Actually, an even better approach would be to set libxl
debug level based on the level set in /etc/libvirt/libvirtd.conf.  But
AFAIK, the settings of various knobs in libvirtd.conf are generally not
available to the individual drivers.

Regards,
Jim

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.