[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV block improvements



On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote:
> On 26/06/13 12:37, Ian Campbell wrote:
> > On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:
> >> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini
> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> On Tue, 25 Jun 2013, Ian Campbell wrote:
> >>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monnà wrote:
> >>>>> On 21/06/13 20:07, Matt Wilson wrote:
> >>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monnà wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> While working on further block improvements I've found an issue with
> >>>>>>> persistent grants in blkfront.
> >>>>>>>
> >>>>>>> Persistent grants basically allocate grants and then they are never
> >>>>>>> released, so both blkfront and blkback keep using the same memory 
> >>>>>>> pages
> >>>>>>> for all the transactions.
> >>>>>>>
> >>>>>>> This is not a problem in blkback, because we can dynamically choose 
> >>>>>>> how
> >>>>>>> many grants we want to map. On the other hand, blkfront cannot remove
> >>>>>>> the access to those grants at any point, because blkfront doesn't know
> >>>>>>> if blkback has this grants mapped persistently or not.
> >>>>>>>
> >>>>>>> So if for example we start expanding the number of segments in 
> >>>>>>> indirect
> >>>>>>> requests, to a value like 512 segments per requests, blkfront will
> >>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device,
> >>>>>>> that's much more grants that the current default, which is 32*256 = 
> >>>>>>> 8192
> >>>>>>> (if using grant tables v2). This can cause serious problems to other
> >>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding 
> >>>>>>> all
> >>>>>>> possible grants, leaving other interfaces completely locked.
> >>>>>> Yikes.
> >>>>>>
> >>>>>>> I've been thinking about different ways to solve this, but so far I
> >>>>>>> haven't been able to found a nice solution:
> >>>>>>>
> >>>>>>> 1. Limit the number of persistent grants a blkfront instance can use,
> >>>>>>> let's say that only the first X used grants will be persistently 
> >>>>>>> mapped
> >>>>>>> by both blkfront and blkback, and if more grants are needed the 
> >>>>>>> previous
> >>>>>>> map/unmap will be used.
> >>>>>> I'm not thrilled with this option. It would likely introduce some
> >>>>>> significant performance variability, wouldn't it?
> >>>>> Probably, and also it will be hard to distribute the number of available
> >>>>> grant across the different interfaces in a performance sensible way,
> >>>>> specially given the fact that once a grant is assigned to a interface it
> >>>>> cannot be returned back to the pool of grants.
> >>>>>
> >>>>> So if we had two interfaces with very different usage (one very busy and
> >>>>> another one almost idle), and equally distribute the grants amongst
> >>>>> them, one will have a lot of unused grants while the other will suffer
> >>>>> from starvation.
> >>>> I do think we need to implement some sort of reclaim scheme, which
> >>>> probably does mean a specific request (per your #4). We simply can't
> >>>> have a device which once upon a time had high throughput but is no
> >>>> mostly ideal continue to tie up all those grants.
> >>>>
> >>>> If you make the reuse of grants use an MRU scheme and reclaim the
> >>>> currently unused tail fairly infrequently and in large batches then the
> >>>> perf overhead should be minimal, I think.
> >>>>
> >>>> I also don't think I would discount the idea of using ephemeral grants
> >>>> to cover bursts so easily either, in fact it might fall out quite
> >>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap
> >>>> since grant map is relative inexpensive, and recovering from the burst
> >>>> shouldn't be too expensive if you batch it. If it turns out to be not a
> >>>> burst but a sustained level of I/O then the MRU scheme would mean you
> >>>> wouldn't be recovering them.
> >>>>
> >>>> I also think there probably needs to be some tunable per device limit on
> >>>> the maximum persistent grants, perhaps minimum and maximum pool sizes
> >>>> ties in with an MRU scheme? If nothing else it gives the admin the
> >>>> ability to prioritise devices.
> >>> If we introduce a reclaim call we have to be careful not to fall back
> >>> to a map/unmap scheme like we had before.
> >>>
> >>> The way I see it either these additional grants are useful or not.
> >>> In the first case we could just limit the maximum amount of persistent
> >>> grants and be done with it.
> >>> If they are not useful (they have been allocated for one very large
> >>> request and not used much after that), could we find a way to identify
> >>> unusually large requests and avoid using persistent grants for those?
> >> Isn't it possible that these grants are useful for some periods of
> >> time, but not for others?  You wouldn't say, "Caching the disk data in
> >> main memory is either useful or not; if it is not useful (if it was
> >> allocated for one very large request and not used much after that), we
> >> should find a way to identify unusually large requests and avoid
> >> caching it."  If you're playing a movie, sure; but in most cases, the
> >> cache was useful for a time, then stopped being useful.  Treating the
> >> persistent grants the same way makes sense to me.
> > Right, this is what I was trying to suggest with the MRU scheme. If you
> > are using lots of grants and you keep on reusing them then they remain
> > persistent and don't get reclaimed. If you are not reusing them for a
> > while then they get reclaimed. If you make "for a while" big enough then
> > you should find you aren't unintentionally falling back to a map/unmap
> > scheme.
> 
> And I was trying to say that I agreed with you. :-)

Excellent ;-)

> BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep 
> the most recently used"; is there a practical difference between that 
> and "LRU" ("Discard the Least Recently Used")?

I started off with LRU and then got my self confused and changed it
everywhere. Yes I mean keep Most Recently Used == discard Least Recently
Used.

> Presumably we could implement the clock algorithm pretty reasonably...

That's the sort of approach I was imagining...

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.