[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis



On 06/08/15 10:20, George Dunlap wrote:
> On 06/08/2015 02:22 PM, Stefano Stabellini wrote:
>>>      3. A group of entities which operate in isolation by only ever
>>>         increasing or descreasing the max pages according to their own
>>>         requirements, without reference to anyone else. When QEMU
>>>         entered the fray, and with the various libxl fixes since, you
>>>         might think we are implementing this model, but we aren't
>>>         because the hypervisor interface is only "set", not
>>>         "increment/decrement" and so there is a racy read/modify/write
>>>         cycle in every entity now.
>>
>> I don't think this is true: QEMU only sets maxmem at domain creation,
>> before "xl create" even returns. I think that is safe. We had an email
>> exchange at the time, I explained this behaviour and the general opinon
>> was that it is acceptable. I don't understand why it is not anymore.
> 
> Well for one, nobody on the hypervisor side seems to have been brought
> in -- I definitely would have objected, and it sounds like AndyC would
> have objected too.
> 
> I think we need to go back one level further.
> 
> So first, let's make a distinction between *pages*, which are actual
> host RAM assigned to the guest and put in its p2m table, and *memory*,
> which is a virtualization construct (i.e., virtual RAM and video memory
> for virtual graphics cards).
> 
> The hypervisor only cares about *pages*.  It allocates pages to a
> domain, it puts them in the p2m.  That's all it knows.
> 
> The purpose of max_pages in the hypervisor is to make sure that no guest
> can allocate more host memory (pages) than it is allowed to have.
> 
> How many pages is a particular guest allowed to have?
> 
> Well pages are used for a number of purposes:
> * To implement virtual RAM in the guest
> * To implement video ram for virtual devices in qemu
> * To implement virtual ROMs
> * For magic "shared pages" used behind-the-scenes (not visible to the
> guest)
> 
> (Feel free to add anything I missed.)
> 
> max_pages in the hypervisor must be set to the sum of all the pages the
> domain is allowed to have.
> 
> So the first point is this: Xen doesn't have a clue about any of those.
>  It doesn't know how much virtual RAM a guest has, how much video RAM it
> has, how many virtual ROMs, how many magic shared pages, or anything.
> All it knows are what pages are in the p2m table.
> 
> So although Xen certainly *enforces* max_pages, it is (at the moment) in
> no position to *decide* what max_pages should be.
> 
> At the moment, in fact, nobody is.  There is no single place that has a
> clear picture into how virtual RAM, guest devices, guest ROMs, and
> "magic pages" convert into actual number of pages.  I think that's a bug.
> 
> And at the moment, pages in the p2m are allocated by a number of entities:
> * In the libxc domain builder.
> * In the guest balloon driver
> * And now, in qemu, to allocate extra memory for virtual ROMs.

This is not correct.  QEMU and hvmloader both allocate pages for their
use.  LIBXL_MAXMEM_CONSTANT allows QEMU and hvmloader to allocate some
pages.  The QEMU change only comes into play after LIBXL_MAXMEM_CONSTANT
has been reached.

> 
> Did I miss anything?
> 
> For the first two, it's libxl that sets maxmem, based in its calculation
> of the size of virtual RAM plus various other bits that will be needed.
>  Having qemu *also* set maxmem was always the wrong thing to do, IMHO.
> 

It does it for all 3 (4?) because it adds LIBXL_MAXMEM_CONSTANT.

> In theory, from the interface perspective, what libxl promises to
> provide is virtual RAM.  When you say "memory=8192" in a domain config,
> that means (or should mean) 8192MiB of virtual RAM, exclusive of video
> RAM, virtual ROMs, and magic pages.  Then when you say "xl mem-set
> 4096", it should again be aiming at giving the VM the equivalent of
> 4096MiB of virtual RAM, exclusive of video RAM, &c &c.


Not what is currently done.  virtual video RAM is subtracted from "memory=".

> 
> We already have the problem that the balloon driver at the moment
> doesn't actually know how big the guest RAM is, nor , but is being told
> to make a balloon exactly big enough to bring the total RAM down to a
> specific target.
> 
> I think we do need to have some place in the middle that actually knows
> how much memory is actually needed for the different sub-systems, so it
> can calculate and set maxmem appropriately.  libxl is the obvious place.

Maybe.  So you want libxl to know the detail of balloon overhead?  How
about the different sizes of all possible Option ROMs in all QEMU
version?  What about hvmloader usage of memory?

> 
> What about this:
> * Libxl has a maximum amount of RAM that qemu is *allowed* to use to set
> up virtual ROMs, video ram for virtual devices, &c
> * At start-of-day, it sets maxpages to PAGES(virtual RAM)+PAGES(magic) +
> max_qemu_pages
> * Qemu allocates as many pages as it needs for option ROMS, and writes
> the amount that it actually did use into a special node in xenstore.
> * When the domain is unpaused, libxl will set maxpages to PAGES(virtual
> RAM) + PAGES(magic) + actual_qemu_pages that it gets from xenstore.
> 

I think this does match What Wei Liu said:

On 06/05/15 15:06, Wei Liu wrote:> On Fri, Jun 05, 2015 at 06:13:01PM
+0100, Stefano Stabellini wrote:
>> On Fri, 5 Jun 2015, Ian Campbell wrote:
>>> On Fri, 2015-06-05 at 17:43 +0100, Wei Liu wrote:
...
>>>> 5. Add a user configurable field in current libxl JSON structure to
>>>>    record how much more memory this domain needs. Admin is required to
>>>>    fill in that value manually. In the mean time we revert the
change in
>>>>    QEMU and declare QEMU with that change buggy.
>>>>
>>>> No response to this so far. But in fact I consider this the most viable
>>>> solution.
>>>
>>> I initially thought that this was just #4 in a silly hat and was
>>> therefore no more acceptable than that.
>>>
>>> But actually I think you are suggesting that users should have to
>>> manually request additional RAM for option roms via some new interface
>>> and that the old thing in qemu should be deprecated and removed?
>>>
>>> How would a user know what value to use here? Just "a bigger one till it
>>> works"? That's, well, not super...
>>
>> This should not be a user configurable field. In fact it only depends on
>> the QEMU version in use.
>
> That field is generic so that we can use it to add some extra pages to
> domain. Using it to cover QEMU option roms would be one use case.  It's
> not very nice, but it's straight-forward.
>
> Wei.

> I think also that probably libxl, rather than setting a target amount of
> memory the balloon driver is supposed to aim at, should set the target
> size of the balloon.  Once qemu tells it how many pages are actually
> being used for virtual devices,
> 
> We could, in theory, expose all this information in xenstore such that
> *either* libxl or qemu would be able to calculate max_pages based on the
> numbers that were written there.  And that would work if we could
> enforce a lock-step between the toolstack and qemu, as we can between
> Xen and the toolstack.  But I think setting anything like this in stone
> is a really bad idea; which unfortulately excludes the idea of putting
> it in qemu.
> 

   -Don Slutz

>  -George
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.