[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] balloon driver broken in 3.12+ after save+restore



On 22/05/14 02:31, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> I have a problem with balloon driver after/during restoring a saved domain.
> There are two symptoms:
> 1. When domain was 'xl mem-set <some size smaller than initial>' just before
> save, it still needs initial memory size to restore. Details below.
> 
> 2. Restored domain sometimes (most of the time) do not want to balloon down.
> For example when the domain has 3300MB and I mem-set it to 2800MB, nothing
> changes immediately (only "target" in sysfs) - both 'xl list' and 'free'
> inside reports the same size (and plenty of free memory in the VM). After some
> time it get ballooned down to ~3000, still not 2800. I haven't found any
> pattern here.
> 
> Both of above was working perfectly in 3.11.
> 
> I'm running Xen 4.1.6.1.
> 
> Details for the first problem:
> Preparation:
> I start the VM as in config at the end of email (memory=400, maxmem=4000),
> wait some time, then 'xl mem-set' to size just about really used memory (about
> 200MB in most cases). Then 'sleep 1' and 'xl save'.
> When I want to restore that domain, I get initial config file, replace memory
> setting with size used in 'xl mem-set' above and call 'xl restore' providing
> that config. It fails with this error:
> ---
> Loading new save file /var/run/qubes/current-savefile (new xl fmt info
> 0x0/0x0/849)
>  Savefile contains xl domain config
> xc: detail: xc_domain_restore start: p2m_size = fa800
> xc: detail: Failed allocation for dom 51: 1024 extents of order 0
> xc: error: Failed to allocate memory for batch.!: Internal error
> xc: detail: Restore exit with rc=1
> libxl: error: libxl_dom.c:313:libxl__domain_restore_common restoring domain:
> Resource temporarily unavailable
> cannot (re-)build domain: -3
> libxl: error: libxl.c:713:libxl_domain_destroy non-existant domain 51
> ---
> When memory set back to 400 (or slightly lower, like 380) - restore succeeded,
> but still the second problem is happening.
> 
> I've bisected the first problem down to this commit:
> commit cd9151e26d31048b2b5e00fd02e110e07d2200c9
>     xen/balloon: set a mapping for ballooned out pages

Sorry for the delay. I somehow missed this.

This is likely caused by the balloon driver creating multiple entries
in the p2m all pointing to the MFNs of the scratch pages. These
duplicates are de-duped on save/restore.

I suspect your 2nd issue may also be caused by this.

Can you try this patch, please?

8<----------------------------------------------
xen/balloon: set ballooned out pages as invalid in p2m

Since cd9151e26d31048b2b5e00fd02e110e07d2200c9 (xen/balloon: set a
mapping for ballooned out pages), a ballooned out page had its entry
in the p2m set to the MFN of one of the scratch page.  This means that
the p2m will contain many entries pointing to the same MFN.

During a domain save, this many-to-one entries are not considered and
the scratch page is saved multiple times. On restore the ballooned
pages are populated with new frames and the domain may use up its
allocation before all pages can be restores.

Set ballooned out pages as INVALID_P2M_ENTRY in the p2m (as they
werebefore), preventing them from being saved and re-populated on
restore.

Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
---
 drivers/xen/balloon.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b7a506f..5c660c7 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -426,20 +426,18 @@ static enum bp_state decrease_reservation(unsigned long 
nr_pages, gfp_t gfp)
                 * p2m are consistent.
                 */
                if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-                       unsigned long p;
-                       struct page   *scratch_page = 
get_balloon_scratch_page();
-
                        if (!PageHighMem(page)) {
+                               struct page *scratch_page = 
get_balloon_scratch_page();
+
                                ret = HYPERVISOR_update_va_mapping(
                                                (unsigned long)__va(pfn << 
PAGE_SHIFT),
                                                
pfn_pte(page_to_pfn(scratch_page),
                                                        PAGE_KERNEL_RO), 0);
                                BUG_ON(ret);
-                       }
-                       p = page_to_pfn(scratch_page);
-                       __set_phys_to_machine(pfn, pfn_to_mfn(p));
 
-                       put_balloon_scratch_page();
+                               put_balloon_scratch_page();
+                       }
+                       __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
                }
 #endif
 
-- 
1.7.10.4

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.