[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis



On Mon, 2015-06-08 at 12:40 +0100, Stefano Stabellini wrote:

> > > I disagree that libxl should be the arbitrator of a property that is
> > > stored, maintained and enforced by Xen. Xen should be the arbitrator.
> > 
> > That's not what "arbitrator" means, an arbitrator decides what the value
> > should be, but that doesn't necessarily imply that it either stores,
> > maintains nor enforces that value. 
> 
> The way you describe it, it looks like some kind of host wide memory
> policy manager,

Not necessarily.

We need to make a distinction between the entity which enforces
maxpages, that (or those) which decide on what maxpages should be and
tells Xen.

Xen is the enforcer. It is also therefore by necessity the holder of the
canonical value for the maxpages, since it has to be.

Xen cannot be the second, it simply doesn't have the necessary
information to make a decision about a domain's memory limit, it just
blindly does whatever it is told by the last thing to tell it something.

There are lots of ways the desired value of maxpages can be decided on,
given a bunch of entities all with some level of responsibility for the
maxpages of a given domain:

     1. One single entity which decides on the limit and tells Xen. I
        think this is the xapi model (not squeezed/balloond, they just
        ask xapi to do things AFAIK, so are a red-herring). The unwary
        might think "libxl" was in this category, but it isn't since it
        is in fact multiple instances of the library.
     2. A group of entities which cooperate as a distributed system such
        that any one of them can recalculate (from from the shared
        state, or by intra entity communication) the current desired
        value of max pages and propagate it to Xen. Prior to QEMU
        starting to play with the max pages the multiple libxl's on a
        system fell into this case, coordinating via xenstore
        transactions and libxl userdata.
     3. A group of entities which operate in isolation by only ever
        increasing or descreasing the max pages according to their own
        requirements, without reference to anyone else. When QEMU
        entered the fray, and with the various libxl fixes since, you
        might think we are implementing this model, but we aren't
        because the hypervisor interface is only "set", not
        "increment/decrement" and so there is a racy read/modify/write
        cycle in every entity now.
     4. Xen exposes multiple "maxpages" variables corresponding to the
        various entities which have a need to control this (or multiple
        per entity). Xen adds all of those up to come up with the actual
        maxpages.
     5. I thought there wasw one more but I've forgotten what it was.

We used to have #2, we now have a broken version of #3 (broken because
it is racy due to the xen interface used).

Fixing #2 would be a case of adding a qmp command to allow libxl to ask
qmp what its extra memory needs are at any given point (start or day,
post hotplug event etc, or even of every memset).

Fixing #3 would be a case of adding a new hypercall interface to allow
the amount of memory to be changed by a delta instead of just set and
having everyone use it. You'd need to decide on migration whether to
propagate the current at stop and copy time and having everyone migrate
their requirements vs having everyone reregister their requirements on
the far end.

#4 is a new hypercall interface, which everyone would have to use. I
happen to think this is a pretty silly design.

> the same way as other memory management tools were never accepted into
> xl/libxl but kept to separate daemons, like xenballoond or squeezed.

I think squeezed and friends are a red-herring here, since they are
either a client of one or more entities or in the more distributed ones
are just one party among many.

> Let's step away from this specific issue for a second. If it is not an
> host way policy manager but a per-VM layer on top of libxc, what value
> is this indirection actually adding?

Given the raciness of libxc -- it is adding correctness.

> > What locking is there around QEMU's read/modify/write of the maxmem
> > value? What happens if someone else also modifies maxmem at the
> > same time?
> 
> It only happens at start of day before the domain is unpaused. At the
> time I couldn't come up with a scenario where it would be an issue,
> unless the admin is purposely trying to shut himself in the foot.

I don't think there is anything preventing a call to the toolstack to
set the amount of memory for a domain right after the domid exists.

In a system with something like balloond or squeezed I can well imagine
them doing something when a new domain was started resulting in opening
of this race condition. (even if something is "ask libxl" not "call
xc_setmaxpages").

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.