[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Claim mode and HVM PoD interact badly



On Mon, Jan 20, 2014 at 4:29 PM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> On Fri, Jan 10, 2014 at 09:58:07AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 10, 2014 at 11:59:42AM +0000, Ian Campbell wrote:
>> > create ^
>> > owner Wei Liu <wei.liu2@xxxxxxxxxx>
>> > thanks
>> >
>> > On Fri, 2014-01-10 at 11:56 +0000, Wei Liu wrote:
>> > > When I have following configuration in HVM config file:
>> > >   memory=128
>> > >   maxmem=256
>> > > and have claim_mode=1 in /etc/xen/xl.conf, xl create fails with
>> > >
>> > > xc: error: Could not allocate memory for HVM guest as we cannot claim 
>> > > memory! (22 = Invalid argument): Internal error
>> > > libxl: error: libxl_dom.c:647:libxl__build_hvm: hvm building failed
>> > > libxl: error: libxl_create.c:1000:domcreate_rebuild_done: cannot 
>> > > (re-)build domain: -3
>> > > libxl: error: libxl_dm.c:1467:kill_device_model: unable to find device 
>> > > model pid in /local/domain/82/image/device-model-pid
>> > > libxl: error: libxl.c:1425:libxl__destroy_domid: 
>> > > libxl__destroy_device_model failed for 82
>> > >
>> > > With claim_mode=0, I can sucessfuly create HVM guest.
>> >
>> > Is it trying to claim 256M instead of 128M? (although the likelyhood
>>
>> No. 128MB actually.
>>
>> > that you only have 128-255M free is quite low, or are you
>> > autoballooning?)
>>
>> This patch fixes it for me. It basically sets the amount of pages
>> claimed to be 'maxmem' instead of 'memory' for PoD.
>>
>> I don't know PoD very well, and this claim is only valid during the
>> allocation of the guests memory - so the 'target_pages' value might be
>> the wrong one. However looking at the hypervisor's
>> 'p2m_pod_set_mem_target' I see this comment:
>>
>>  316  *     B <T': Set the PoD cache size equal to the number of outstanding 
>> PoD
>>  317  *   entries.  The balloon driver will deflate the balloon to give back
>>  318  *   the remainder of the ram to the guest OS.
>>
>> Which implies to me that we _need_ the 'maxmem' amount of memory at boot 
>> time.
>> And then it is the responsibility of the balloon driver to give the memory
>> back (and this is where the 'static-max' et al come in play to tell the
>> balloon driver to balloon out).
>>
>>
>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>> index 77bd365..65e9577 100644
>> --- a/tools/libxc/xc_hvm_build_x86.c
>> +++ b/tools/libxc/xc_hvm_build_x86.c
>> @@ -335,7 +335,12 @@ static int setup_guest(xc_interface *xch,
>>
>>      /* try to claim pages for early warning of insufficient memory 
>> available */
>>      if ( claim_enabled ) {
>> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
>> +        unsigned long nr = nr_pages - cur_pages;
>> +
>> +        if ( pod_mode )
>> +            nr = target_pages - 0x20;
>> +
>
> I'm a bit confused, did this work for you? At this point d->tot_pages
> should be (target_pages - 0x20). However in the hypervisor logic if you
> try to claim the exact amount of pages as d->tot_pages it should return
> EINVAL.
>
> Furthur more, the original logic doesn't look right. In PV guest
> creation, xc tries to claim "memory=" pages, while in HVM guest creation
> it tries to claim "maxmem=" pages. I think HVM code is wrong.
>
> And George shed me some light on PoD this morning, "cache" in PoD should
> be the pool of pages that used to populate into guest physical memory.
> In that sense it should be the size of mem_target ("memory=").
>
> So I come up with a fix like this. Any idea?
>
> Wei.
>
> ---8<---
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..472f1df 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -49,6 +49,8 @@
>  #define NR_SPECIAL_PAGES     8
>  #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
>
> +#define POD_VGA_HOLE_SIZE (0x20)
> +
>  static int modules_init(struct xc_hvm_build_args *args,
>                          uint64_t vend, struct elf_binary *elf,
>                          uint64_t *mstart_out, uint64_t *mend_out)
> @@ -305,11 +307,13 @@ static int setup_guest(xc_interface *xch,
>      if ( pod_mode )
>      {
>          /*
> -         * Subtract 0x20 from target_pages for the VGA "hole".  Xen will
> -         * adjust the PoD cache size so that domain tot_pages will be
> -         * target_pages - 0x20 after this call.
> +         * Subtract POD_VGA_HOLE_SIZE from target_pages for the VGA
> +         * "hole".  Xen will adjust the PoD cache size so that domain
> +         * tot_pages will be target_pages - POD_VGA_HOLE_SIZE after
> +         * this call.
>           */
> -        rc = xc_domain_set_pod_target(xch, dom, target_pages - 0x20,
> +        rc = xc_domain_set_pod_target(xch, dom,
> +                                      target_pages - POD_VGA_HOLE_SIZE,
>                                        NULL, NULL, NULL);
>          if ( rc != 0 )
>          {
> @@ -335,7 +339,12 @@ static int setup_guest(xc_interface *xch,
>
>      /* try to claim pages for early warning of insufficient memory available 
> */
>      if ( claim_enabled ) {
> -        rc = xc_domain_claim_pages(xch, dom, nr_pages - cur_pages);
> +        unsigned long nr = target_pages;
> +
> +        if ( pod_mode )
> +            nr -= POD_VGA_HOLE_SIZE;
> +
> +        rc = xc_domain_claim_pages(xch, dom, nr);

Two things:

1. This is broken because it doesn't claim pages for the PoD "cache".
The PoD "cache" amounts to *all the pages that the domain will have
allocated* -- there will be basically no pages allocated after this
point.

Claim mode is trying to make creation of large guests fail early or be
guaranteed to succeed.  For large guests, it's set_pod_target() that
may take the long time, and it's there that things will fail if
there's not enough memory.  By the time you get to actually setting up
the p2m, you've already made it.

2. I think the VGA_HOLE doesn't have anything to do with PoD.

Actually, it looks like the original code was wrong: correct me if I'm
wrong, but xc_domain_claim_pages() wants the *total number of pages*,
whereas nr_pages-cur_pages would give you the *number of additional
pages*.

I think you need to claim nr_pages-VGA_HOLE_SIZE regardless of whether
you're in PoD mode or not.  The initial code would claim 0xa0 pages
too few; the new code will claim 0x20 pages too many in the non-PoD
case.

> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 5f484a2..1e44ba3 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -339,8 +339,8 @@ int domain_set_outstanding_pages(struct domain *d, 
> unsigned long pages)
>          goto out;
>      }
>
> -    /* disallow a claim not exceeding current tot_pages or above max_pages */
> -    if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
> +    /* disallow a claim below current tot_pages or above max_pages */
> +    if ( (pages < d->tot_pages) || (pages > d->max_pages) )
>      {

This seems like a good interface change in any case -- there's no
particular reason not to allow multiple calls with the same claim
amount.  (Interface-wise, I don't see a good reason we couldn't allow
the claim to be reduced as well; but that's probably more than we want
to do at this point.)

So it seems like we should at least make the two fixes above:
* Use nr_pages-VGA_HOLE_SIZE for the claim, regardless of whether PoD is enabled
* Allow a claim equal to tot_pages

That will allow PoD guests to boot with claim mode enabled, although
it will effectively be a noop.

The next question is whether we should try to make claim mode actually
do the claim properly for PoD mode for 4.4.  It seems like just moving
the claim call up before the xc_domain_set_target() call should work;
I'm inclined to say that if that works, we should just do it.

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.