[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] domain creation vs querying free memory (xend and xl)



On Oct 4, 2012, at 1:18 PM, Dan Magenheimer wrote:

>> From: Andres Lagar-Cavilla [mailto:andreslc@xxxxxxxxxxxxxx]
>> Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and 
>> xl)
>> 
>> 
>> On Oct 4, 2012, at 12:59 PM, Dan Magenheimer wrote:
>> 
>>>> From: Andres Lagar-Cavilla [mailto:andreslc@xxxxxxxxxxxxxx]
>>>> Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and 
>>>> xl)
>>>> 
>>>> On Oct 4, 2012, at 6:06 AM, Tim Deegan wrote:
>>>> 
>>>>> At 14:56 -0700 on 02 Oct (1349189817), Dan Magenheimer wrote:
>>>>>> Tmem argues that doing "memory capacity transfers" at a page granularity
>>>>>> can only be done efficiently in the hypervisor.  This is true for
>>>>>> page-sharing when it breaks a "share" also... it can't go ask the
>>>>>> toolstack to approve allocation of a new page every time a write to a 
>>>>>> shared
>>>>>> page occurs.
>>>>>> 
>>>>>> Does that make sense?
>>>>> 
>>>>> Yes.  The page-sharing version can be handled by having a pool of
>>>>> dedicated memory for breaking shares, and the toolstack asynchronously
>>>>> replenish that, rather than allowing CoW to use up all memory in the
>>>>> system.
>>>> 
>>>> That is doable. One benefit is that it would minimize the chance of a VM 
>>>> hitting a CoW ENOMEM. I
>> don't
>>>> see how it would altogether avoid it.
>>> 
>>> Agreed, so it doesn't really solve the problem.  (See longer reply
>>> to Tim.)
>>> 
>>>> If the objective is trying to put a cap to the unpredictable growth of 
>>>> memory allocations via CoW
>>>> unsharing, two observations: (1) will never grow past nominal VM footprint 
>>>> (2) One can put a cap
>> today
>>>> by tweaking d->max_pages -- CoW will fail, faulting vcpu will sleep, and 
>>>> things can be kicked back
>>>> into action at a later point.
>>> 
>>> But IIRC isn't it (2) that has given VMware memory overcommit a bad name?
>>> Any significant memory pressure due to overcommit leads to double-swapping,
>>> which leads to horrible performance?
>> 
>> The little that I've been able to read from their published results is that 
>> a "lot" of CPU is consumed
>> scanning memory and fingerprinting, which leads to a massive assault on 
>> micro-architectural caches.
>> 
>> I don't know if that equates to a "bad name", but I don't think that is a 
>> productive discussion
>> either.
> 
> Sorry, I wasn't intending that to be snarky, but on re-read I guess it
> did sound snarky.  What I meant is: Is this just a manual version of what
> VMware does automatically? Or is there something I am misunderstanding?
> (I think you answered that below.)
> 
>> (2) doesn't mean swapping. Note that d->max_pages can be set artificially 
>> low by an admin, raised
>> again. etc. It's just a mechanism to keep a VM at bay while corrective 
>> measures of any kind are taken.
>> It's really up to a higher level controller whether you accept allocations 
>> and later reach a point of
>> thrashing.
>> 
>> I understand this is partly where your discussion is headed, but certainly 
>> fixing the primary issue of
>> nominal vanilla allocations preempting each other looks fairly critical to 
>> begin with.
> 
> OK.  I _think_ the design I proposed helps in systems that are using
> page-sharing/host-swapping as well... I assume share-breaking just
> calls the normal hypervisor allocator interface to allocate a
> new page (if available)?  If you could review and comment on
> the design from a page-sharing/host-swapping perspective, I would
> appreciate it.

I think you will need to refine your notion of reservation. If you have nominal 
RAM N, and current RAM C, N >= C, it makes no sense to reserve N so the VM 
later has room to occupy by swapping-in, unsharing or whatever -- then you are 
not over-committing memory.

To the extent that you want to facilitate VM creation, it does make sense to 
reserve C and guarantee that.

Then it gets mm-specific. PoD has one way of dealing with the allocation 
growth. xenpaging tries to stick to the watermark -- if something swaps in 
something else swaps out. And uncooperative balloons are be stymied by xapi 
using d->max_pages.

This is why I believe you need to solve the problem of initial reservation, and 
the problem of handing off to the right actor. And then xl need not care any 
further.

Andres

> 
> Thanks,
> Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.