Xen project Mailing List

Re: [Xen-devel] [Hackathon minutes] PV block improvements

To: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Thu, 27 Jun 2013 15:21:05 +0100

Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Delivery-date: Thu, 27 Jun 2013 14:21:24 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote: > On 26/06/13 12:37, Ian Campbell wrote: > > On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote: > >> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini > >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >>> On Tue, 25 Jun 2013, Ian Campbell wrote: > >>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau MonnÃ wrote: > >>>>> On 21/06/13 20:07, Matt Wilson wrote: > >>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau MonnÃ wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> While working on further block improvements I've found an issue with > >>>>>>> persistent grants in blkfront. > >>>>>>> > >>>>>>> Persistent grants basically allocate grants and then they are never > >>>>>>> released, so both blkfront and blkback keep using the same memory > >>>>>>> pages > >>>>>>> for all the transactions. > >>>>>>> > >>>>>>> This is not a problem in blkback, because we can dynamically choose > >>>>>>> how > >>>>>>> many grants we want to map. On the other hand, blkfront cannot remove > >>>>>>> the access to those grants at any point, because blkfront doesn't know > >>>>>>> if blkback has this grants mapped persistently or not. > >>>>>>> > >>>>>>> So if for example we start expanding the number of segments in > >>>>>>> indirect > >>>>>>> requests, to a value like 512 segments per requests, blkfront will > >>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device, > >>>>>>> that's much more grants that the current default, which is 32*256 = > >>>>>>> 8192 > >>>>>>> (if using grant tables v2). This can cause serious problems to other > >>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding > >>>>>>> all > >>>>>>> possible grants, leaving other interfaces completely locked. > >>>>>> Yikes. > >>>>>> > >>>>>>> I've been thinking about different ways to solve this, but so far I > >>>>>>> haven't been able to found a nice solution: > >>>>>>> > >>>>>>> 1. Limit the number of persistent grants a blkfront instance can use, > >>>>>>> let's say that only the first X used grants will be persistently > >>>>>>> mapped > >>>>>>> by both blkfront and blkback, and if more grants are needed the > >>>>>>> previous > >>>>>>> map/unmap will be used. > >>>>>> I'm not thrilled with this option. It would likely introduce some > >>>>>> significant performance variability, wouldn't it? > >>>>> Probably, and also it will be hard to distribute the number of available > >>>>> grant across the different interfaces in a performance sensible way, > >>>>> specially given the fact that once a grant is assigned to a interface it > >>>>> cannot be returned back to the pool of grants. > >>>>> > >>>>> So if we had two interfaces with very different usage (one very busy and > >>>>> another one almost idle), and equally distribute the grants amongst > >>>>> them, one will have a lot of unused grants while the other will suffer > >>>>> from starvation. > >>>> I do think we need to implement some sort of reclaim scheme, which > >>>> probably does mean a specific request (per your #4). We simply can't > >>>> have a device which once upon a time had high throughput but is no > >>>> mostly ideal continue to tie up all those grants. > >>>> > >>>> If you make the reuse of grants use an MRU scheme and reclaim the > >>>> currently unused tail fairly infrequently and in large batches then the > >>>> perf overhead should be minimal, I think. > >>>> > >>>> I also don't think I would discount the idea of using ephemeral grants > >>>> to cover bursts so easily either, in fact it might fall out quite > >>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap > >>>> since grant map is relative inexpensive, and recovering from the burst > >>>> shouldn't be too expensive if you batch it. If it turns out to be not a > >>>> burst but a sustained level of I/O then the MRU scheme would mean you > >>>> wouldn't be recovering them. > >>>> > >>>> I also think there probably needs to be some tunable per device limit on > >>>> the maximum persistent grants, perhaps minimum and maximum pool sizes > >>>> ties in with an MRU scheme? If nothing else it gives the admin the > >>>> ability to prioritise devices. > >>> If we introduce a reclaim call we have to be careful not to fall back > >>> to a map/unmap scheme like we had before. > >>> > >>> The way I see it either these additional grants are useful or not. > >>> In the first case we could just limit the maximum amount of persistent > >>> grants and be done with it. > >>> If they are not useful (they have been allocated for one very large > >>> request and not used much after that), could we find a way to identify > >>> unusually large requests and avoid using persistent grants for those? > >> Isn't it possible that these grants are useful for some periods of > >> time, but not for others? You wouldn't say, "Caching the disk data in > >> main memory is either useful or not; if it is not useful (if it was > >> allocated for one very large request and not used much after that), we > >> should find a way to identify unusually large requests and avoid > >> caching it." If you're playing a movie, sure; but in most cases, the > >> cache was useful for a time, then stopped being useful. Treating the > >> persistent grants the same way makes sense to me. > > Right, this is what I was trying to suggest with the MRU scheme. If you > > are using lots of grants and you keep on reusing them then they remain > > persistent and don't get reclaimed. If you are not reusing them for a > > while then they get reclaimed. If you make "for a while" big enough then > > you should find you aren't unintentionally falling back to a map/unmap > > scheme. > > And I was trying to say that I agreed with you. :-) Excellent ;-) > BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep > the most recently used"; is there a practical difference between that > and "LRU" ("Discard the Least Recently Used")? I started off with LRU and then got my self confused and changed it everywhere. Yes I mean keep Most Recently Used == discard Least Recently Used. > Presumably we could implement the clock algorithm pretty reasonably... That's the sort of approach I was imagining... Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.