[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Claim mode and HVM PoD interact badly



On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
> On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
> > > create ^
> > > owner Wei Liu <wei.liu2@xxxxxxxxxx>
> > > thanks
> > > 
> > > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
> > > > When I have following configuration in HVM config file:
> > > >   memory=128
> > > >   maxmem=256
> > > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
> > > > 
> > > > xc: error: Could not allocate memory for HVM guest as we cannot claim 
> > > > memory! (22 = Invalid argument): Internal error
> > > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
> > > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot 
> > > > (re-)build domain: -3
> > > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device 
> > > > model pid in /local/domain/82/image/device-model-pid
> > > > libxl: error: libxl.c:1425:libxl__destroy_domid: 
> > > > libxl__destroy_device_model failed for 82
> > > > 
> > > > With claim_mode=0, I can sucessfuly create HVM guest.
> > > 
> > > Is it trying to claim 256M instead of 128M? (although the likelyhood
> > 
> > No. 128MB actually.
> > 
> > > that you only have 128-255M free is quite low, or are you
> > > autoballooning?)
> > 
> > This patch fixes it for me. It basically sets the amount of pages
> > claimed to be 'maxmem' instead of 'memory' for PoD.
> > 
> > I don't know PoD very well,
> 
> Me neither, this might have to wait for George to get back.

<nods>
> 
> We should also consider flipping the default claim setting to off in xl
> for 4.4, since that is likely to be a lower impact change than fixing
> the issue (and one which we all understand!).

<unwraps the Xen 4.4 duct-tape roll>

> 
> >  and this claim is only valid during the
> > allocation of the guests memory - so the 'target_pages' value might be
> > the wrong one. However looking at the hypervisor's
> > 'p2m_pod_set_mem_target' I see this comment:
> > 
> >  316  *     B <T': Set the PoD cache size equal to the number of 
> > outstanding PoD
> >  317  *   entries.  The balloon driver will deflate the balloon to give back
> >  318  *   the remainder of the ram to the guest OS.
> > 
> > Which implies to me that we _need_ the 'maxmem' amount of memory at boot 
> > time.
> > And then it is the responsibility of the balloon driver to give the memory
> > back (and this is where the 'static-max' et al come in play to tell the
> > balloon driver to balloon out).
> 
> PoD exists purely so that we don't need the 'maxmem' amount of memory at
> boot time. It is basically there in order to let the guest get booted
> far enough to load the balloon driver to give the memory back.
> 
> It's basically a boot time zero-page sharing mechanism AIUI.

But it does look to gulp up hypervisor memory and return it during
allocation of memory for the guest.

Digging in the hypervisor I see in 'p2m_pod_set_cache_target' (where
pod_target is for this case maxmem - memory):

And pod.count is zero, so for Wei's case it would be 128MB.

 216     /* Increasing the target */
 217     while ( pod_target > p2m->pod.count )
 218     {
 222         if ( (pod_target - p2m->pod.count) >= SUPERPAGE_PAGES )
 223             order = PAGE_ORDER_2M;
 224         else
 225             order = PAGE_ORDER_4K;
 226     retry:
 227         page = alloc_domheap_pages(d, order, PAGE_ORDER_4K);

So allocate 64 2MB pages

 243         p2m_pod_cache_add(p2m, page, order);

Add to a list

251 
 252     /* Decreasing the target */
 253     /* We hold the pod lock here, so we don't need to worry about
 254      * cache disappearing under our feet. */
 255     while ( pod_target < p2m->pod.count )
 256     {
..
 266         page = p2m_pod_cache_get(p2m, order);

Get the page (from the list)
..
 287             put_page(page+i);

And then free it.


From reading the code the patch seems correct - we will _need_ that
extra 128MB 'claim' to allocate/free the 128MB extra pages. They
are temporary as we do free them.

> 
> > diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..65e9577 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
> >  
> >      /* try to claim pages for early warning of insufficient memory 
> > available */
> >      if ( claim_enabled ) {
> > -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> > +        unsigned long nr = nr_pages - cur_pages;
> > +
> > +        if ( pod_mode )
> > +            nr = target_pages - 0x20;
> 
> 0x20?

Yup. From earlier:

305     if ( pod_mode )
306     {
307         /*
308          * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
309          * adjust the PoD cache size so that domain tot_pages will be
310          * target_pages - 0x20 after this call.
311          */
312         rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
313                                       NULL, NULL, NULL);
314         if ( rc != 0 )
315         {
316             PERROR("Could not set PoD target for HVM guest.\n");
317             goto error_out;
318         }
319     }

Maybe a nice little 'pod_delta' or 'pod_pages' should be used instead of copying
this around.

> 
> > +
> > +        rc = xc_domain_claim_pages(xch, dom, nr);
> >          if ( rc != 0 )
> >          {
> >              PERROR("Could not allocate memory for HVM guest as we cannot 
> > claim memory!");
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.