[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] libxl: fix dom0 balloon logic



Stefano Stabellini wrote:
> On Mon, 23 Mar 2015, Ian Campbell wrote:
>   
>> (just ccing the other tools maintainers, in particular Stefano who knows
>> what this stuff is supposed to do...)
>>
>> On Fri, 2015-03-20 at 17:10 -0600, Jim Fehlig wrote:
>>     
>>> Recent testing on large memory systems revealed a bug in the Xen xl
>>> tool's freemem() function.  When autoballooning is enabled, freemem()
>>> is used to ensure enough memory is available to start a domain,
>>> ballooning dom0 if necessary.  When ballooning large amounts of memory
>>> from dom0, freemem() would exceed its self-imposed wait time and
>>> return an error.  Meanwhile, dom0 continued to balloon.  Starting the
>>> domain later, after sufficient memory was ballooned from dom0, would
>>> succeed.  The libvirt implementation in libxlDomainFreeMem() suffers
>>> the same bug since it is modeled after freemem().
>>>
>>> In the end, the best place to fix the bug on the Xen side was to
>>> slightly change the behavior of libxl_wait_for_memory_target().
>>> Instead of failing after caller-provided wait_sec, the function now
>>> blocks as long as dom0 memory ballooning is progressing.  It will return
>>> failure only when more memory is needed to reach the target and wait_sec
>>> have expired with no progress being made.  See xen.git commit fd3aa246.
>>> There was a dicussion on how this would affect other libxl apps like
>>> libvirt
>>>
>>> http://lists.xen.org/archives/html/xen-devel/2015-03/msg00739.html
>>>
>>> If libvirt containing this patch was build against a Xen containing
>>> the old libxl_wait_for_memory_target() behavior, libxlDomainFreeMem()
>>> will fail after 30 sec and domain creation will be terminated.
>>> Without this patch and with old libxl_wait_for_memory_target() behavior,
>>> libxlDomainFreeMem() does not succeed after 30 sec, but returns success
>>> anyway.  Domain creation continues resulting in all sorts of fun stuff
>>> like cpu soft lockups in the guest OS.  It was decided to properly fix
>>> libxl_wait_for_memory_target(), and if anything improve the default
>>> behavior of apps using the freemem reference impl in xl.
>>>
>>> xl was patched to accommodate the change in libxl_wait_for_memory_target()
>>> with xen.git commit 883b30a0.  This patch does the same in the libxl
>>> driver.  While at it, I changed the logic to essentially match
>>> freemem() in $xensrc/tools/libxl/xl_cmdimpl.c.  It was a bit cleaner
>>> IMO and will make it easier to spot future, potentially interesting
>>> divergences.
>>>
>>> Signed-off-by: Jim Fehlig <jfehlig@xxxxxxxx>
>>> ---
>>>  src/libxl/libxl_domain.c | 57 
>>> ++++++++++++++++++++++++------------------------
>>>  1 file changed, 28 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c
>>> index 407a9bd..ff78133 100644
>>> --- a/src/libxl/libxl_domain.c
>>> +++ b/src/libxl/libxl_domain.c
>>> @@ -1121,38 +1121,41 @@ libxlDomainFreeMem(libxlDomainObjPrivatePtr priv, 
>>> libxl_domain_config *d_config)
>>>  {
>>>      uint32_t needed_mem;
>>>      uint32_t free_mem;
>>> -    size_t i;
>>> -    int ret = -1;
>>> +    int ret;
>>>      int tries = 3;
>>>      int wait_secs = 10;
>>>  
>>> -    if ((ret = libxl_domain_need_memory(priv->ctx, &d_config->b_info,
>>> -                                        &needed_mem)) >= 0) {
>>> -        for (i = 0; i < tries; ++i) {
>>> -            if ((ret = libxl_get_free_memory(priv->ctx, &free_mem)) < 0)
>>> -                break;
>>> +    ret = libxl_domain_need_memory(priv->ctx, &d_config->b_info,
>>> +                                   &needed_mem);
>>> +    if (ret < 0)
>>> +        goto error;
>>>  
>>> -            if (free_mem >= needed_mem) {
>>> -                ret = 0;
>>> -                break;
>>> -            }
>>> +    do {
>>> +        ret = libxl_get_free_memory(priv->ctx, &free_mem);
>>> +        if (ret < 0)
>>> +            goto error;
>>>  
>>> -            if ((ret = libxl_set_memory_target(priv->ctx, 0,
>>> -                                               free_mem - needed_mem,
>>> -                                               /* relative */ 1, 0)) < 0)
>>> -                break;
>>> +        if (free_mem >= needed_mem)
>>> +            return 0;
>>>  
>>> -            ret = libxl_wait_for_free_memory(priv->ctx, 0, needed_mem,
>>> -                                             wait_secs);
>>> -            if (ret == 0 || ret != ERROR_NOMEM)
>>> -                break;
>>> +        ret = libxl_set_memory_target(priv->ctx, 0,
>>> +                                      free_mem - needed_mem,
>>> +                                      /* relative */ 1, 0);
>>> +        if (ret < 0)
>>> +            goto error;
>>>  
>>> -            if ((ret = libxl_wait_for_memory_target(priv->ctx, 0, 1)) < 0)
>>> -                break;
>>> -        }
>>> -    }
>>> +        ret = libxl_wait_for_free_memory(priv->ctx, 0, needed_mem,
>>> +                                         wait_secs);
>>> +        if (ret < 0)
>>> +            goto error;
>>>       
>
> Shouldn't this be a call to libxl_wait_for_memory_target instead?
>   

Err, right. One would think I would have caught that after all the talk
about libxl_wait_for_memory_target() in the commit msg :-/. V2 on the way...

Regards,
Jim


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.