[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis

To: Ian Campbell <ian.campbell@xxxxxxxxxx>
From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Date: Wed, 17 Jun 2015 14:35:25 +0100
Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Don Slutz <dslutz@xxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Wed, 17 Jun 2015 13:36:07 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, 9 Jun 2015, Ian Campbell wrote:
> On Tue, 2015-06-09 at 11:14 +0100, Stefano Stabellini wrote:
> > I don't think that the current solution is inherently racy. If we are
> > really interested in an honest evaluation of the current solution in
> > terms of races, I would be happy to do so.
> 
> Consider a domain with 1M of RAM (==target and maxmem for the sake of
> argument) and two simultaneous calls to libxl_set_memory_target, both
> with relative=0 and enforce=1, one with target=3 and the other with
> target=5.
> 
> target=3 call                 target=5 call
> 
>                               get ptr.max_memkb, gets 1
> 
> get ptr.max_memkb, gets 1
> start transaction
> do xenstore stuff, seeing target=1, setting target=3
> memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb;
> memorykb = 1 - 1 + 3
> xc_setmaxmem(3)
> transaction commit (success)
> 
> Now target=maxmem=3
> 
>                               start transaction
>                               do xenstore stuff, seeing target=3, setting 
> target=5
>                               memorykb = ptr.max_memkb - current_target_memkb 
> + new_target_memkb;
>                               memorykb = 1 - 3 + 5
>                               xc_setmaxmem(3)
>                               transaction commit (success)
> 
> Now target=5, maxmem=3.

Yes, I see the issue. Pretty nasty!

> The obvious fix of moving the get ptr.max_memkb inside the transaction
> fails in a different way in the case where the first transaction commit
> fails and is retried after the second one, I think.

The underlying problem is that xc_domain_setmaxmem only takes absolute
values.  Retrieving the old maxmem value and setting the new one cannot
be done atomically, so they need to be protected by a lock or a
transaction.  You are right that moving get ptr.max_memkb inside the
transaction would only fix the problem if we also properly rolled back
the old maxmem value in case of failures.  But I don't think it is
actually possible because it could race against other
libxl_set_memory_target calls.  Am I wrong?  This would also be a
problem before 0c029c4da21.

> Prior to 0c029c4da21 "libxl_set_memory_target: retain the same maxmem
> offset on top of the current target" this issue didn't exist because
> memorykb was just memorykb = new_target_memkb + videoram.

It is true that reverting 0c029c4da21 would improve the situation.
However I think that you found a problem here that goes beyond
0c029c4da21 :-(

> BTW, I noticed some other (unrelated) dubious stuff in there while
> looking at this, in particular the setmaxmem is not rolled back if
> set_pod_target fails.

Yes, you are right! It is also not rolled back in case the xenstore
transaction fails with errno != EAGAIN, even before 0c029c4da21.

> Also the transiently "wrong" maxmem between the
> setmaxmem and a failed transaction getting redone might possibly give
> rise to some interesting cases, especially if anything else fails the
> second time around the loop.

I am thinking that the while idea of an "enforce" option was a bad to
begin with. Not only we should revert 0c029c4da21, but actually we
should just get rid of "enforce" altogether? I am suggesting it because
I don't think that reverting 0c029c4da21 would fix all the issues.

In a pre-0c029c4da21 world:

target=3 enforce=1

start transaction
do xenstore stuff, seeing target=1, setting target=3
memorykb = new_target_memkb + videoram;
xc_setmaxmem(3+videoram)
transaction commit (fail errno=ESOMETHING)

Now target=1, maxmem=3+videoram

Unless we say that we don't care about leaving the system in a
consistent state in case of failures != EAGAIN.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] QEMU bumping memory bug analysis
  - From: Wei Liu
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Stefano Stabellini
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Ian Campbell
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Stefano Stabellini
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Ian Campbell
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Stefano Stabellini
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: George Dunlap
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Don Slutz
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: George Dunlap
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Stefano Stabellini
- Re: [Xen-devel] QEMU bumping memory bug analysis
  - From: Ian Campbell

Prev by Date: Re: [Xen-devel] [PATCH v2] xen: arm: Do not expose PMU to domain 0
Next by Date: Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
Previous by thread: Re: [Xen-devel] QEMU bumping memory bug analysis
Next by thread: Re: [Xen-devel] QEMU bumping memory bug analysis
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.