[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Ballooning dom0: insufficient memory (libxl) or CPU soft lockups (libvirt)


  • To: xen-devel@xxxxxxxxxxxxx
  • From: Mike Latimer <mlatimer@xxxxxxxx>
  • Date: Sat, 13 Dec 2014 20:20:49 -0700
  • Delivery-date: Sun, 14 Dec 2014 03:21:45 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi,

I've recently been testing large memory (64GB - 1TB) domains, and encountering 
CPU soft lockups while dom0 is ballooning down to free memory for the domain. 
The root of the issue also exposes a difference between libxl and libvirt.

When creating a domain using xl, if ballooning is enabled (and required) there 
is a 33 second window for the memory request to be satisfied. If not, an 
ERROR_NOMEM is returned and the domain create fails. (See 
tools/libxl/xl_cmdimpl.c:freemem)

The libvirt code for the same operation 
(src/libxl/libxl_domain.c:libxlDomainFreeMem) is nearly identical, except the 
function returns the value of 'ret'. The intent seems to be the same as libxl, 
but ret is set to 0 by libxl_wait_for_memory_target if memory ballooning is 
still ongoing at the end of the 33 second loop. The end result is that when 
using libvirt, the process believes the free memory call succeeded and 
continues to create the domain despite the fact that dom0 has not finished 
ballooning.

In either case, dom0 continues to balloon in the background. In the case of 
libxl, a second attempt to create the domain will succeed after waiting until 
this ballooning finishes. With libvirt, the original create request encounters 
contention between dom0 ballooning down and that same memory being allocated 
to the starting domain. This contention can cause CPU soft lockups, and a 
major performance degradation. This issue is more easily seen when using large 
domains (64-128GB+) and slower memory models (such as large NUMA 
configurations).

It is trivial to correct the bug in libvirt and cause it to return ERROR_NOMEM 
if ballooning is not finished by the end of the libxlDomainFreeMem loop. (I've 
tested this, and it does cause libvirt to behave like libxl.) However, it 
seems that a more correct fix would be to continue to wait for free memory if 
the ballooning process is progressing.

In some tests I've performed, ballooning down 100GB has taken as long as 2.5 
minutes. If users are attempting to create very large domains, the 33 second 
delay to balloon the memory seems rather low. I realize that best practices 
include using a set dom0 size with ballooning disabled, but I'd rather not see 
insufficient memory errors or produce CPU soft lockups if users choose not to 
follow this advice.

To summarize:

- If using xl, dom0 ballooning has to complete in 33 seconds, or ERROR_NOMEM 
will be encountered.
- If using virsh, the domain can be created while dom0 is still ballooning 
down. This results in CPU soft lockups/performance degradation across the 
entire host. (When creating a very large domain, the soft lockups can be 
severe enough to kill the machine.)

Any thoughts on handling this?

Thanks,
Mike


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.