[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis



On Tue, 9 Jun 2015, Ian Campbell wrote:
> On Tue, 2015-06-09 at 11:14 +0100, Stefano Stabellini wrote:
> > I don't think that the current solution is inherently racy. If we are
> > really interested in an honest evaluation of the current solution in
> > terms of races, I would be happy to do so.
> 
> Consider a domain with 1M of RAM (==target and maxmem for the sake of
> argument) and two simultaneous calls to libxl_set_memory_target, both
> with relative=0 and enforce=1, one with target=3 and the other with
> target=5.
> 
> target=3 call                 target=5 call
> 
>                               get ptr.max_memkb, gets 1
> 
> get ptr.max_memkb, gets 1
> start transaction
> do xenstore stuff, seeing target=1, setting target=3
> memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb;
> memorykb = 1 - 1 + 3
> xc_setmaxmem(3)
> transaction commit (success)
> 
> Now target=maxmem=3
> 
>                               start transaction
>                               do xenstore stuff, seeing target=3, setting 
> target=5
>                               memorykb = ptr.max_memkb - current_target_memkb 
> + new_target_memkb;
>                               memorykb = 1 - 3 + 5
>                               xc_setmaxmem(3)
>                               transaction commit (success)
> 
> Now target=5, maxmem=3.

Yes, I see the issue. Pretty nasty!


> The obvious fix of moving the get ptr.max_memkb inside the transaction
> fails in a different way in the case where the first transaction commit
> fails and is retried after the second one, I think.

The underlying problem is that xc_domain_setmaxmem only takes absolute
values.  Retrieving the old maxmem value and setting the new one cannot
be done atomically, so they need to be protected by a lock or a
transaction.  You are right that moving get ptr.max_memkb inside the
transaction would only fix the problem if we also properly rolled back
the old maxmem value in case of failures.  But I don't think it is
actually possible because it could race against other
libxl_set_memory_target calls.  Am I wrong?  This would also be a
problem before 0c029c4da21.


> Prior to 0c029c4da21 "libxl_set_memory_target: retain the same maxmem
> offset on top of the current target" this issue didn't exist because
> memorykb was just memorykb = new_target_memkb + videoram.

It is true that reverting 0c029c4da21 would improve the situation.
However I think that you found a problem here that goes beyond
0c029c4da21 :-(


> BTW, I noticed some other (unrelated) dubious stuff in there while
> looking at this, in particular the setmaxmem is not rolled back if
> set_pod_target fails.

Yes, you are right! It is also not rolled back in case the xenstore
transaction fails with errno != EAGAIN, even before 0c029c4da21.


> Also the transiently "wrong" maxmem between the
> setmaxmem and a failed transaction getting redone might possibly give
> rise to some interesting cases, especially if anything else fails the
> second time around the loop.

I am thinking that the while idea of an "enforce" option was a bad to
begin with. Not only we should revert 0c029c4da21, but actually we
should just get rid of "enforce" altogether? I am suggesting it because
I don't think that reverting 0c029c4da21 would fix all the issues.

In a pre-0c029c4da21 world:

target=3 enforce=1

start transaction
do xenstore stuff, seeing target=1, setting target=3
memorykb = new_target_memkb + videoram;
xc_setmaxmem(3+videoram)
transaction commit (fail errno=ESOMETHING)

Now target=1, maxmem=3+videoram

Unless we say that we don't care about leaving the system in a
consistent state in case of failures != EAGAIN.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.