[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory

  • To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "George Dunlap" <dunlapg@xxxxxxxxx>
  • Date: Tue, 23 Dec 2008 12:55:10 +0000
  • Delivery-date: Tue, 23 Dec 2008 04:55:36 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition:x-google-sender-auth; b=biIPpmjegtelDjXDOFnTtbenQHH+xKuzttoAcBkskkftVdTzfOmUXwKLLQTF7TZhZ0 nqWDp/nOyp8iC+w8kChGJNIEeVgRTnYx7PEPSJigil4LwgX2PWBefk4y5gXZz1gX2qhv WxsrZfXeGnCcXx+4txdRqD4G//7wrtvWwf9sQ=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

This set of patches introduces a set of mechanisms and interfaces to
implement populate-on-demand memory.  The purpose of
populate-on-demand memory is to allow non-paravirtualized guests (such
as Windows or Linux HVM) boot in a ballooned state.


When non-PV domains boots, they typically read the e820 maps to
determine how much memory they have, and then assume that much memory
thereafter.  Memory requirements can be reduced using a balloon
driver, but it cannot be increased past this initial value.
Currently, this means that a non-PV domain must be booted with the
maximum amount of memory you want that VM every to be able to use.

Populate-on-demand allows us to "boot ballooned", in the following manner:
* Mark the entire range of memory (memory_static_max aka maxmem) with
a new p2m type, populate_on_demand, reporting memory_static_max in th
e820 map.  No memory is allocated at this stage.
* Allocate the "memory_dynamic_max" (aka "target") amount of memory
for a "PoD cache".  This memory is kept on a separate list in the
domain struct.
* Boot the guest.
* Populate the p2m table on-demand as it's accessed with pages from
the PoD cache.
* When the balloon driver loads, it inflates the balloon size to
(maxmem - target), giving the memory back to Xen.  When this is
accomplished, the "populate-on-demand" portion of boot is effectively

One complication is that many operating systems have start-of-day page
scrubbers, which touch all of memory to zero it.  This scrubber may
run before the balloon driver can return memory to Xen.  These zeroed
pages, however, don't contain any information; we can safely replace
them with PoD entries again.  So when we run out of PoD cache, we do
an "emergency sweep" to look for zero pages we can reclaim for the
populate-on-demand cache.  When we find a page range which is entirely
zero, we mark the gfn range PoD again, and put the memory back into
the PoD cache.

NB that this code is designed to work only in conjunction with a
balloon driver.  If the balloon driver is not loaded, eventually all
pages will be dirtied (non-zero), the emergency sweep will fail, and
there will be no memory to back outstanding PoD pages.  When this
happens, the domain will crash.

The code works for both shadow mode and HAP mode; it has been tested
with NPT/RVI and shadow, but not yet with EPT.  It also attempts to
avoid splintering superpages, to allow HAP to function more

To use:
* ensure that you have a functioning balloon driver in the guest
(e.g., xen_balloon.ko for Linux HVM guests).
* Set maxmem/memory_static_max to one value, and
memory/memory_dynamic_max to another when creating the domain; e.g:
 # xm create debian-hvm maxmem=512 memory=256

The patches are as follows:
01 - Add a p2m_query_type to core gfn_to_mfn*() functions.

02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will
not populate PoD entries.  Specifically, since gfn_to_mfn() may grab
the p2m lock, it must not be called while the shadow lock is held.

03 - Populate-on-demand core.  Introduce new p2m type, PoD cache
structures, and core functionality.  Add PoD checking to audit_p2m().
Add PoD information to the 'q' debug key.

04 - Implement p2m_decrease_reservation.  As the balloon driver
returns gfns to Xen, it handles PoD entries properly; it also "steals"
memory being returned for the PoD cache instead of freeing it, if

05 - emergency sweep: Implement emergency sweep for zero memory if the
cache is low.  If it finds pages (or page ranges) entirely zero, it
will replace the entry with a PoD entry again, reclaiming the memory
for the PoD cache.

06 - Deal with splintering both PoD pages (to back singleton PoD
entries) and PoD ranges

07 - Xen interface for populate-on-demand functionality: PoD flag for
populate_physmap, {get,set}_pod_target for interacting with the PoD
cache.  set_pod_target() should be called for any domain that may have
PoD entries.  It will increase the size of the cache if necessary, but
will never decrease the size of the cache.  (This will be done as the
balloon driver balloons down.)

08 - libxc interface.  Add a new libxc functions:
+ xc_hvm_build_target_mem(), which accepts memsize and target.  If
these are equal, PoD functionality is not invoked.  Otherwise, memsize
is marked PoD, and the target MiB is allocated to the PoD cache.
+ xc_[sg]et_pod_target(): get / set PoD target.  set_pod_target()
should be called whenever you change the guest target mem on a domain
which may have outstaning PoD entries.  This may increase the size of
the PoD cache up to the number of outstanding PoD entries, but will
not reduce the size of the cache.  (The cache may be reduced as the
balloon driver returns gfn space to Xen.)

09 - xend integration.
+ Always calls xc_hvm_build_target_mem() with memsize=maxmem and
target=memory.  If these the same, the internal function will not use
+ Calls xc_set_target_mem() whenever a domain's target is changed.
Also calls balloon.free(), causing dom0 to balloon down itself if
there's not enough memory otherwise.

Things still to do:
* When reduce_reservation() is called with a superpage, keep the
superpage intact.
* Create a hypercall continuation for set_pod_target.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.