[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/2] libs/light: fix a race when domain are destroied and created concurrently



It is possible to encounter a segfault in libxl during concurrent domain
create and destroy operations.

This is because Placement of existing domains on the host's CPUs is examined
when creating a new domain, but the existing logic does not tolerate well a
domain disappearing during the examination.

The race is quite difficult to trigger. Whan that happens, this is an
example of a backtrace:

#0  libxl_bitmap_dispose (map=map@entry=0x50) at libxl_utils.c:626
#1  0x00007fe72c993a32 in libxl_vcpuinfo_dispose (p=p@entry=0x38) at 
_libxl_types.c:692
#2  0x00007fe72c94e3c4 in libxl_vcpuinfo_list_free (list=0x0, nr=<optimized 
out>) at libxl_utils.c:1059
#3  0x00007fe72c9528bf in nr_vcpus_on_nodes (vcpus_on_node=0x7fe71000eb60, 
suitable_cpumap=0x7fe721df0d38, tinfo_elements=48, tinfo=0x7fe7101b3900, 
gc=0x7fe7101bbfa0) at libxl_numa.c:258
#4  libxl__get_numa_candidate (gc=gc@entry=0x7fe7100033a0, 
min_free_memkb=4233216, min_cpus=4, min_nodes=min_nodes@entry=0, 
max_nodes=max_nodes@entry=0, 
suitable_cpumap=suitable_cpumap@entry=0x7fe721df0d38, numa_cmpf=0x7fe72c940110 
<numa_cmpf>, cndt_out=0x7fe721df0cf0, cndt_found=0x7fe721df0cb4) at 
libxl_numa.c:394
#5  0x00007fe72c94152b in numa_place_domain (d_config=0x7fe721df11b0, 
domid=975, gc=0x7fe7100033a0) at libxl_dom.c:209
#6  libxl__build_pre (gc=gc@entry=0x7fe7100033a0, domid=domid@entry=975, 
d_config=d_config@entry=0x7fe721df11b0, state=state@entry=0x7fe710077700) at 
libxl_dom.c:436
#7  0x00007fe72c92c4a5 in libxl__domain_build (gc=0x7fe7100033a0, 
d_config=d_config@entry=0x7fe721df11b0, domid=975, state=0x7fe710077700) at 
libxl_create.c:444
#8  0x00007fe72c92de8b in domcreate_bootloader_done (egc=0x7fe721df0f60, 
bl=0x7fe7100778c0, rc=<optimized out>) at libxl_create.c:1222
#9  0x00007fe72c980425 in libxl__bootloader_run (egc=egc@entry=0x7fe721df0f60, 
bl=bl@entry=0x7fe7100778c0) at libxl_bootloader.c:403
#10 0x00007fe72c92f281 in initiate_domain_create (egc=egc@entry=0x7fe721df0f60, 
dcs=dcs@entry=0x7fe7100771b0) at libxl_create.c:1159
#11 0x00007fe72c92f456 in do_domain_create (ctx=ctx@entry=0x7fe71001c840, 
d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, 
restore_fd=restore_fd@entry=-1, send_back_fd=send_back_fd@entry=-1, 
params=params@entry=0x0, ao_how=0x0, aop_console_how=0x7fe721df10f0) at 
libxl_create.c:1856
#12 0x00007fe72c92f776 in libxl_domain_create_new (ctx=0x7fe71001c840, 
d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, 
ao_how=ao_how@entry=0x0, aop_console_how=aop_console_how@entry=0x7fe721df10f0) 
at libxl_create.c:2075

Luckily, it is easy to close the race, by just making sure that
libvxl_list_vcpu() returns 0 as the number of vCPUs of the domain, when
it also returns NULL as the list of them.

Regards
---
Dario Faggioli (2):
      tools/libs/light: numa placement: don't try to free a NULL list of vcpus
      tools/libs/light: don't touch nr_vcpus_out if listing vcpus and returning 
NULL

 tools/libs/light/libxl_domain.c | 14 ++++++++------
 tools/libs/light/libxl_numa.c   |  4 +++-
 2 files changed, 11 insertions(+), 7 deletions(-)
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.