[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV block improvements



On 27/06/13 16:21, Ian Campbell wrote:
> On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote:
>> On 26/06/13 12:37, Ian Campbell wrote:
>>> On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
>>>> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
>>>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>>>> On Tue, 25 Jun 2013, Ian Campbell wrote:
>>>>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monnà wrote:
>>>>>>> On 21/06/13 20:07, Matt Wilson wrote:
>>>>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monnà wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> While working on further block improvements I've found an issue with
>>>>>>>>> persistent grants in blkfront.
>>>>>>>>>
>>>>>>>>> Persistent grants basically allocate grants and then they are never
>>>>>>>>> released, so both blkfront and blkback keep using the same memory 
>>>>>>>>> pages
>>>>>>>>> for all the transactions.
>>>>>>>>>
>>>>>>>>> This is not a problem in blkback, because we can dynamically choose 
>>>>>>>>> how
>>>>>>>>> many grants we want to map. On the other hand, blkfront cannot remove
>>>>>>>>> the access to those grants at any point, because blkfront doesn't know
>>>>>>>>> if blkback has this grants mapped persistently or not.
>>>>>>>>>
>>>>>>>>> So if for example we start expanding the number of segments in 
>>>>>>>>> indirect
>>>>>>>>> requests, to a value like 512 segments per requests, blkfront will
>>>>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device,
>>>>>>>>> that's much more grants that the current default, which is 32*256 = 
>>>>>>>>> 8192
>>>>>>>>> (if using grant tables v2). This can cause serious problems to other
>>>>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding 
>>>>>>>>> all
>>>>>>>>> possible grants, leaving other interfaces completely locked.
>>>>>>>> Yikes.
>>>>>>>>
>>>>>>>>> I've been thinking about different ways to solve this, but so far I
>>>>>>>>> haven't been able to found a nice solution:
>>>>>>>>>
>>>>>>>>> 1. Limit the number of persistent grants a blkfront instance can use,
>>>>>>>>> let's say that only the first X used grants will be persistently 
>>>>>>>>> mapped
>>>>>>>>> by both blkfront and blkback, and if more grants are needed the 
>>>>>>>>> previous
>>>>>>>>> map/unmap will be used.
>>>>>>>> I'm not thrilled with this option. It would likely introduce some
>>>>>>>> significant performance variability, wouldn't it?
>>>>>>> Probably, and also it will be hard to distribute the number of available
>>>>>>> grant across the different interfaces in a performance sensible way,
>>>>>>> specially given the fact that once a grant is assigned to a interface it
>>>>>>> cannot be returned back to the pool of grants.
>>>>>>>
>>>>>>> So if we had two interfaces with very different usage (one very busy and
>>>>>>> another one almost idle), and equally distribute the grants amongst
>>>>>>> them, one will have a lot of unused grants while the other will suffer
>>>>>>> from starvation.
>>>>>> I do think we need to implement some sort of reclaim scheme, which
>>>>>> probably does mean a specific request (per your #4). We simply can't
>>>>>> have a device which once upon a time had high throughput but is no
>>>>>> mostly ideal continue to tie up all those grants.
>>>>>>
>>>>>> If you make the reuse of grants use an MRU scheme and reclaim the
>>>>>> currently unused tail fairly infrequently and in large batches then the
>>>>>> perf overhead should be minimal, I think.
>>>>>>
>>>>>> I also don't think I would discount the idea of using ephemeral grants
>>>>>> to cover bursts so easily either, in fact it might fall out quite
>>>>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
>>>>>> since grant map is relative inexpensive, and recovering from the burst
>>>>>> shouldn't be too expensive if you batch it. If it turns out to be not a
>>>>>> burst but a sustained level of I/O then the MRU scheme would mean you
>>>>>> wouldn't be recovering them.
>>>>>>
>>>>>> I also think there probably needs to be some tunable per device limit on
>>>>>> the maximum persistent grants, perhaps minimum and maximum pool sizes
>>>>>> ties in with an MRU scheme? If nothing else it gives the admin the
>>>>>> ability to prioritise devices.
>>>>> If we introduce a reclaim call we have to be careful not to fall back
>>>>> to a map/unmap scheme like we had before.
>>>>>
>>>>> The way I see it either these additional grants are useful or not.
>>>>> In the first case we could just limit the maximum amount of persistent
>>>>> grants and be done with it.
>>>>> If they are not useful (they have been allocated for one very large
>>>>> request and not used much after that), could we find a way to identify
>>>>> unusually large requests and avoid using persistent grants for those?
>>>> Isn't it possible that these grants are useful for some periods of
>>>> time, but not for others?  You wouldn't say, "Caching the disk data in
>>>> main memory is either useful or not; if it is not useful (if it was
>>>> allocated for one very large request and not used much after that), we
>>>> should find a way to identify unusually large requests and avoid
>>>> caching it."  If you're playing a movie, sure; but in most cases, the
>>>> cache was useful for a time, then stopped being useful.  Treating the
>>>> persistent grants the same way makes sense to me.
>>> Right, this is what I was trying to suggest with the MRU scheme. If you
>>> are using lots of grants and you keep on reusing them then they remain
>>> persistent and don't get reclaimed. If you are not reusing them for a
>>> while then they get reclaimed. If you make "for a while" big enough then
>>> you should find you aren't unintentionally falling back to a map/unmap
>>> scheme.
>>
>> And I was trying to say that I agreed with you. :-)
> 
> Excellent ;-)

I also agree that this is the best solution, I will start looking at
implementing it.

>> BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep 
>> the most recently used"; is there a practical difference between that 
>> and "LRU" ("Discard the Least Recently Used")?
> 
> I started off with LRU and then got my self confused and changed it
> everywhere. Yes I mean keep Most Recently Used == discard Least Recently
> Used.

This will help if the disk is only doing intermittent bursts of data,
but if the disk is under high I/O during a long time we might end up
under the same situation (all grants hoarded by a single disk). We
should make sure that there's always a buffer of unused grants so other
disks or nic interfaces can continue to work as expected.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.