[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] balloon driver broken in 3.12+ after save+restore
On 27.06.2014 11:51, David Vrabel wrote: > On 22/05/14 02:31, Marek Marczykowski-GÃrecki wrote: >> Hi, >> >> I have a problem with balloon driver after/during restoring a saved domain. >> There are two symptoms: >> 1. When domain was 'xl mem-set <some size smaller than initial>' just before >> save, it still needs initial memory size to restore. Details below. >> >> 2. Restored domain sometimes (most of the time) do not want to balloon down. >> For example when the domain has 3300MB and I mem-set it to 2800MB, nothing >> changes immediately (only "target" in sysfs) - both 'xl list' and 'free' >> inside reports the same size (and plenty of free memory in the VM). After >> some >> time it get ballooned down to ~3000, still not 2800. I haven't found any >> pattern here. >> >> Both of above was working perfectly in 3.11. >> >> I'm running Xen 4.1.6.1. >> >> Details for the first problem: >> Preparation: >> I start the VM as in config at the end of email (memory=400, maxmem=4000), >> wait some time, then 'xl mem-set' to size just about really used memory >> (about >> 200MB in most cases). Then 'sleep 1' and 'xl save'. >> When I want to restore that domain, I get initial config file, replace memory >> setting with size used in 'xl mem-set' above and call 'xl restore' providing >> that config. It fails with this error: >> --- >> Loading new save file /var/run/qubes/current-savefile (new xl fmt info >> 0x0/0x0/849) >> Savefile contains xl domain config >> xc: detail: xc_domain_restore start: p2m_size = fa800 >> xc: detail: Failed allocation for dom 51: 1024 extents of order 0 >> xc: error: Failed to allocate memory for batch.!: Internal error >> xc: detail: Restore exit with rc=1 >> libxl: error: libxl_dom.c:313:libxl__domain_restore_common restoring domain: >> Resource temporarily unavailable >> cannot (re-)build domain: -3 >> libxl: error: libxl.c:713:libxl_domain_destroy non-existant domain 51 >> --- >> When memory set back to 400 (or slightly lower, like 380) - restore >> succeeded, >> but still the second problem is happening. >> >> I've bisected the first problem down to this commit: >> commit cd9151e26d31048b2b5e00fd02e110e07d2200c9 >> xen/balloon: set a mapping for ballooned out pages > > Sorry for the delay. I somehow missed this. > > This is likely caused by the balloon driver creating multiple entries > in the p2m all pointing to the MFNs of the scratch pages. These > duplicates are de-duped on save/restore. > > I suspect your 2nd issue may also be caused by this. > > Can you try this patch, please? Looks to be the right fix, thanks! > > 8<---------------------------------------------- > xen/balloon: set ballooned out pages as invalid in p2m > > Since cd9151e26d31048b2b5e00fd02e110e07d2200c9 (xen/balloon: set a > mapping for ballooned out pages), a ballooned out page had its entry > in the p2m set to the MFN of one of the scratch page. This means that > the p2m will contain many entries pointing to the same MFN. > > During a domain save, this many-to-one entries are not considered and > the scratch page is saved multiple times. On restore the ballooned > pages are populated with new frames and the domain may use up its > allocation before all pages can be restores. > > Set ballooned out pages as INVALID_P2M_ENTRY in the p2m (as they > werebefore), preventing them from being saved and re-populated on > restore. > > Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx> > --- > drivers/xen/balloon.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index b7a506f..5c660c7 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -426,20 +426,18 @@ static enum bp_state decrease_reservation(unsigned long > nr_pages, gfp_t gfp) > * p2m are consistent. > */ > if (!xen_feature(XENFEAT_auto_translated_physmap)) { > - unsigned long p; > - struct page *scratch_page = > get_balloon_scratch_page(); > - > if (!PageHighMem(page)) { > + struct page *scratch_page = > get_balloon_scratch_page(); > + > ret = HYPERVISOR_update_va_mapping( > (unsigned long)__va(pfn << > PAGE_SHIFT), > > pfn_pte(page_to_pfn(scratch_page), > PAGE_KERNEL_RO), 0); > BUG_ON(ret); > - } > - p = page_to_pfn(scratch_page); > - __set_phys_to_machine(pfn, pfn_to_mfn(p)); > > - put_balloon_scratch_page(); > + put_balloon_scratch_page(); > + } > + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); > } > #endif > > -- Best Regards, Marek Marczykowski-GÃrecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |