[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Linux Xen Balloon Driver Improvement (Draft 2)



On 27/10/14 19:14, Wei Liu wrote:
> On Mon, Oct 27, 2014 at 05:29:16PM +0000, David Vrabel wrote:
>> On 27/10/14 16:29, Wei Liu wrote:
>>> On Mon, Oct 27, 2014 at 02:23:22PM +0000, David Vrabel wrote:
>>>> On 27/10/14 12:33, Wei Liu wrote:
>>>>>
>>>>> Changes in this version:
>>>>>
>>>>> 1. Style, grammar and typo fixes.
>>>>> 2. Make this document Linux centric.
>>>>> 3. Add a new section for NUMA-aware ballooning.
>>>>
>>>> You've not included the required changes to the toolstack and
>>>> autoballoon driver to always use 2M multiples when creating VMs and
>>>> setting targets.
>>>>
>>>
>>> When creating VM, toolstack already tries to use as many huge pages as
>>> possible.
>>>
>>> Setting target doesn't use 2M multiples.  But I don't think this is
>>> necessary. To balloon in / out X MB memory
>>>
>>>   nr_2m = X % 2M
>>>   nr_4k = (X / 2M) / 4k
>>>
>>> The remainder just goes to 4K queue.
>>
>> I understand that it will work with 4k multiples but it is not /optimal/
>> to do so since it will result in more fragmentation.
>>
> 
> The fragmentation should be less than 2M right? Is that terrible?

I think it will increase fragmentation every time the target is set. Or
perhaps more correctly, I can't prove that it does not increase
fragmentation each time.

> Because the basic requirement for this design is to not rely on
> hypervisor side feature, so that we can have it worked on older
> hypervisor as well. And by far the proposed design seems to stick to
> that principle well.

Even with the requirement for no hypervisor changes, I do not see how
you can produce a good design without considering the hypervisor
behaviour (both current and possible future changes).

Having said this, after thinking some more, in this case it is
sufficient to show that every step in the guest balloon driver always
reduces fragmentation, regardless of the underlying hypervisor behaviour.

>>>>> ### Periodically exchange normal size pages with huge pages
>>>>>
>>>>> Worker thread wakes up periodically to check if there are enough pages
>>>>> in normal size page queue to coalesce into a huge page. If so, it will
>>>>> try to exchange that huge page into a number of normal size pages with
>>>>> XENMEM\_exchange hypercall.
>>>>
>>>> I don't see what this is supposed to achieve.  This is going to take a
>>>> (potentially) non-fragmented superpage and fragment it.
>>>>
>>>
>>> Let's look at this from start of day.
>>>
>>> Guest always tries to balloon in / out as many 2M pages as possible. So
>>> if we have a long list of 4K pages, it means the underlying host super
>>> frames are fragmented already.
>>>
>>> So if 1) there are enough 4K pages in ballooned out list, 2) there is a
>>> spare 2M page, it means that the 2M page comes from the result of
>>> balloon page compaction, which means the underlying host super frame is
>>> fragmented.
>>
>> This assumption is only true because your page migration isn't trying
>> hard enough to defragment super frames,
> 
> However hard it tries, if the hypervisor is not defragmenting, this
> assumption still stands. As long as you get the 2M page as a result of
> balloon compaction, the underlying host frame is fragmented. Note, we're
> not worse than before.
> 
>> and it is assuming that Xen does
>> nothing to address host super frame fragmentation.  This highlights the
>> importance of looking at a system-level for designs, IMO.
>>
> 
> What would make this design different when Xen knows how to defragment
> frames?
> 
> We end up ballooning out a 2M host frame if the underlying huge frame is
> defragemented (instead of a bunch of 4K frames). We're giving huge frame
> back to Xen, so it's OK; then we exchange in 512 4K consecutive pages
> (or a 2M page if we merge them) with 2M frame backing them. Xen is not
> harmed; guest now has got a huge frame. It's only making things better
> if Xen knows how to defragemnt.

Um.  Yes, of course if you use a contiguous and aligned set of of 4K
ballooned pages then it it will work well -- this is exactly the point I
am making. But this is not what your design says.

Also, to quote you from the thread on draft 1:

"As for moving contiguous ballooned pages from 4K list to 2M list,
unfortunately I see a problem with this proposal: The 4K pages list is
not sorted. Sorting it requires hooking into core balloon driver -- that
is, to grab multiple locks to avoid racing with page migration thread,
which is prone to error."

Although I don't understand the comments about multiple locks etc. since
I think the core balloon driver should be responsible for maintaining
the order of the list.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.