[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory


  • To: "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx>
  • From: "George Dunlap" <dunlapg@xxxxxxxxx>
  • Date: Wed, 24 Dec 2008 13:55:20 +0000
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 24 Dec 2008 05:55:48 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=vUhK3xttzB2KCZkcdWIm1w3ZtBNfYN2Npgy9Ibiv/pLp8bd6JPOt/rqm9upFwAx7Ju wj2Yx+C3KAAMA/8Dz6OmAFnxGsWBXUHJKN6mb1lgGbWvOvWpEJB6ggtxjs5SF9YiY8jD LLpMAl4YhFNUvQXEs0EKtHhAVtsWkRRl/U+Xk=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Tue, Dec 23, 2008 at 7:06 PM, Dan Magenheimer
<dan.magenheimer@xxxxxxxxxx> wrote:
> Very nice!

Thanks!

> One thing that might be worth adding to the requirements list or
> README is that this approach (or any which depends on ballooning)
> will now almost certainly require any participating hvm domain
> to have an adequately-sized properly-configured swap disk.
> Ballooning is insufficiently responsive to grow memory fast
> enough to handle rapidly growing memory needs of an active domain
> The consequence for a no-swap-disk is application failures
> and the consequence even if a swap disk IS configured is temporarily
> very poor performance.

I don't think this is particular to the PoD patches, or even
ballooning per se.  A swap disk would be required any time you boot
with a small amount of memory, whether it could be increased or not.

But you're right, in that this differs from a typical operating
system's "demang-paging" mechanism, where the goal is to give a
process only the memory it actually needs, so you can use it for other
processes.  You're still allocating a fixed amount of memory to a
guest at start-up.  The un-populated memory is not available to use by
other VMs, and allocating more memory is a (relatively) slow process.
I guess a brief note pointing out the difference between "populate on
demand" and "allocate on demand" would be useful.

> So this won't work for any domain that does start-of-day
> scrubbing with a non-zero value?  I suppose that's OK.

Not if the scrubber might win the race against the balloon driver. :-)
 If this really becomes an issue, it should be straightforward to add
functionality to handle it.  It just requires having a simple way of
specifying what "scrubbed" pages look like, an extra p2m type for "PoD
scrubbed" (rather than PoD zero, the default), and how to change from
scrubbed <-> zero.

Did you have a particular system in mind?

-George

>> -----Original Message-----
>> From: George Dunlap [mailto:dunlapg@xxxxxxxxx]
>> Sent: Tuesday, December 23, 2008 5:55 AM
>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: [Xen-devel] [RFC][PATCH] 0/9 Populate-on-demand memory
>>
>>
>> This set of patches introduces a set of mechanisms and interfaces to
>> implement populate-on-demand memory.  The purpose of
>> populate-on-demand memory is to allow non-paravirtualized guests (such
>> as Windows or Linux HVM) boot in a ballooned state.
>>
>> BACKGROUND
>>
>> When non-PV domains boots, they typically read the e820 maps to
>> determine how much memory they have, and then assume that much memory
>> thereafter.  Memory requirements can be reduced using a balloon
>> driver, but it cannot be increased past this initial value.
>> Currently, this means that a non-PV domain must be booted with the
>> maximum amount of memory you want that VM every to be able to use.
>>
>> Populate-on-demand allows us to "boot ballooned", in the
>> following manner:
>> * Mark the entire range of memory (memory_static_max aka maxmem) with
>> a new p2m type, populate_on_demand, reporting memory_static_max in th
>> e820 map.  No memory is allocated at this stage.
>> * Allocate the "memory_dynamic_max" (aka "target") amount of memory
>> for a "PoD cache".  This memory is kept on a separate list in the
>> domain struct.
>> * Boot the guest.
>> * Populate the p2m table on-demand as it's accessed with pages from
>> the PoD cache.
>> * When the balloon driver loads, it inflates the balloon size to
>> (maxmem - target), giving the memory back to Xen.  When this is
>> accomplished, the "populate-on-demand" portion of boot is effectively
>> finished.
>>
>> One complication is that many operating systems have start-of-day page
>> scrubbers, which touch all of memory to zero it.  This scrubber may
>> run before the balloon driver can return memory to Xen.  These zeroed
>> pages, however, don't contain any information; we can safely replace
>> them with PoD entries again.  So when we run out of PoD cache, we do
>> an "emergency sweep" to look for zero pages we can reclaim for the
>> populate-on-demand cache.  When we find a page range which is entirely
>> zero, we mark the gfn range PoD again, and put the memory back into
>> the PoD cache.
>>
>> NB that this code is designed to work only in conjunction with a
>> balloon driver.  If the balloon driver is not loaded, eventually all
>> pages will be dirtied (non-zero), the emergency sweep will fail, and
>> there will be no memory to back outstanding PoD pages.  When this
>> happens, the domain will crash.
>>
>> The code works for both shadow mode and HAP mode; it has been tested
>> with NPT/RVI and shadow, but not yet with EPT.  It also attempts to
>> avoid splintering superpages, to allow HAP to function more
>> effectively.
>>
>> To use:
>> * ensure that you have a functioning balloon driver in the guest
>> (e.g., xen_balloon.ko for Linux HVM guests).
>> * Set maxmem/memory_static_max to one value, and
>> memory/memory_dynamic_max to another when creating the domain; e.g:
>>  # xm create debian-hvm maxmem=512 memory=256
>>
>> The patches are as follows:
>> 01 - Add a p2m_query_type to core gfn_to_mfn*() functions.
>>
>> 02 - Change some gfn_to_mfn() calls to gfn_to_mfn_query(), which will
>> not populate PoD entries.  Specifically, since gfn_to_mfn() may grab
>> the p2m lock, it must not be called while the shadow lock is held.
>>
>> 03 - Populate-on-demand core.  Introduce new p2m type, PoD cache
>> structures, and core functionality.  Add PoD checking to audit_p2m().
>> Add PoD information to the 'q' debug key.
>>
>> 04 - Implement p2m_decrease_reservation.  As the balloon driver
>> returns gfns to Xen, it handles PoD entries properly; it also "steals"
>> memory being returned for the PoD cache instead of freeing it, if
>> necessary.
>>
>> 05 - emergency sweep: Implement emergency sweep for zero memory if the
>> cache is low.  If it finds pages (or page ranges) entirely zero, it
>> will replace the entry with a PoD entry again, reclaiming the memory
>> for the PoD cache.
>>
>> 06 - Deal with splintering both PoD pages (to back singleton PoD
>> entries) and PoD ranges
>>
>> 07 - Xen interface for populate-on-demand functionality: PoD flag for
>> populate_physmap, {get,set}_pod_target for interacting with the PoD
>> cache.  set_pod_target() should be called for any domain that may have
>> PoD entries.  It will increase the size of the cache if necessary, but
>> will never decrease the size of the cache.  (This will be done as the
>> balloon driver balloons down.)
>>
>> 08 - libxc interface.  Add a new libxc functions:
>> + xc_hvm_build_target_mem(), which accepts memsize and target.  If
>> these are equal, PoD functionality is not invoked.  Otherwise, memsize
>> is marked PoD, and the target MiB is allocated to the PoD cache.
>> + xc_[sg]et_pod_target(): get / set PoD target.  set_pod_target()
>> should be called whenever you change the guest target mem on a domain
>> which may have outstaning PoD entries.  This may increase the size of
>> the PoD cache up to the number of outstanding PoD entries, but will
>> not reduce the size of the cache.  (The cache may be reduced as the
>> balloon driver returns gfn space to Xen.)
>>
>> 09 - xend integration.
>> + Always calls xc_hvm_build_target_mem() with memsize=maxmem and
>> target=memory.  If these the same, the internal function will not use
>> PoD.
>> + Calls xc_set_target_mem() whenever a domain's target is changed.
>> Also calls balloon.free(), causing dom0 to balloon down itself if
>> there's not enough memory otherwise.
>>
>> Things still to do:
>> * When reduce_reservation() is called with a superpage, keep the
>> superpage intact.
>> * Create a hypercall continuation for set_pod_target.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.