Xen project Mailing List

Re: [Xen-devel] blkback global resources

Date: Tue, 27 Mar 2012 10:21:28 +0100

Cc: Andrei Lifchits <Andrei.Lifchits@xxxxxxxxxx>, wei.liu2@xxxxxxxxxx, Daniel Stodden <daniel.stodden@xxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 27 Mar 2012 09:23:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, 2012-03-27 at 08:27 +0100, Jan Beulich wrote: > >>> On 26.03.12 at 18:53, Daniel Stodden <daniel.stodden@xxxxxxxxxxxxxx> > >>> wrote: > > On Mon, 2012-03-26 at 17:06 +0100, Keir Fraser wrote: > >> Cc'ing Daniel for you on this one, Jan. > >> > >> K. > >> > >> On 26/03/2012 16:56, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > >> > >> > All the resources allocated based on xen_blkif_reqs are global in > >> > blkback. While (without having measured anything) I think that this > >> > is bad from a QoS perspective (not the least implied from a warning > >> > issued by Citrix'es multi-page-ring patches: > >> > > >> > if (blkif_reqs < BLK_RING_SIZE(order)) > >> > printk(KERN_WARNING "WARNING: " > >> > "I/O request space (%d reqs) < ring order %ld, " > >> > "consider increasing %s.reqs to >= %ld.", > >> > blkif_reqs, order, KBUILD_MODNAME, > >> > roundup_pow_of_two(BLK_RING_SIZE(order))); > >> > > >> > indicating that this _is_ a bottleneck), I'm otoh hesitant to convert > >> > this to per-instance allocations, as the amount of memory taken > >> > away from Dom0 for this may be not insignificant when there are > >> > many devices. > >> > > >> > Does anyone have an opinion here, in particular regarding the > >> > original authors' decision to make this global vs. the apparently > >> > made observation (by Daniel Stodden, the author of said patch, > >> > who I don't have any current email of to ask directly), but also > >> > in the context of multi-page rings, the purpose of which is to > >> > allow for larger amounts of in-flight I/O? > >> > > >> > Thanks, Jan > > > > Re-CC'ing Andrei Lifchits, I think there's been some work going on at > > Citrix regarding that matter. > > > > Yes, just allocating a pfn pool per backend instance is way too much > > memory balooned out. Otherwise this stuff would have never looked the > > way it does now. > > This of course could be accounted for by having an initially non-empty > (large enough) balloon (not sure how easy it is these days to do this > for pv-ops, but it has always been trivial with the legacy code). That > wouldn't help a 32-bit kernel much (where generally the initial balloon > is all in highmem, yet the vacated pages need to be in lowmem), but > for 64-bit kernels it should be fine. > > > Regarding the right balance, note that on the other extreme end, if PFN > > space were infinite, there's not much expected performance gain from > > rendering virtual backends fully independent. Beyond controller queue > > depth, these requests are all just going to pile up, waiting. > > Is there a way to look through the queue stack to find out how many > distinct ones there are that the backend is running on top of as well > as - for a particular I/O path - the one with the smallest depth? Or can > one assume that the top most one (generally loop's or blktap2's) won't > advertise a queue deeper than what is going to be accepted > downstream (probably not, I'd guess)? > > And - what you say would similarly apply to the usefulness of multi-page > rings afaict. > The balance is tricky. What I observe so far is that having multi-page rings doesn't necessarily help improve performance (but it is still nice to have it in case of future usage). There are other contentions, which limit throughput of a single VIF. > > XenServer has some support for decoupling in blktap.ko [1] which worked > > relatively well: Use frame 'pool' kobjects. A bunch of pages, mapped to > > sysfs object. Name was arbitrary. Size configurable, even at runtime. > > > > Sysfs meant stuff was easily set up by shell or python code, or > > manually. To become operational, every backend must be bound to a pool > > (initially, the global 'default' one, for tool compat). Backends can be > > relinked arbitrarily before entering Connected state. > > > > Then let the userland toolstack set things up according to physical I/O > > topology and properties probed. Basically every physical backend (say, a > > volume group, or a HBA) would start out by allocating and dimensioning a > > dedicated pool (named after the backend), and every backend instance > > fired up gets bound to the pool it belongs to. > > Having userland do all that seems like a fallback solution only to me - I > would hope that sufficient information is available directly to the drivers. > I tempt to make all information available to drivers, but haven't reached a conclusion yet. Maybe we should also allow user to experiment various configuration for their specific needs? Wei. > Thanks in any case for responding so quickly, > Jan > > > There's a lot of additional optimizations one could consider, e.g. > > autogrowing the pool (log(nbackends) or so?) and the like. To improve > > locality, having backends which look ahead in their request queue and > > allocate whole batches is probably a good idea too, etc, etc. > > > > HTH, > > Daniel > > > > [1] > > http://xenbits.xen.org/gitweb/?p=people/dstodden/linux.git > > mostly in drivers/block/blktap/sysfs.c (show/store_pool) and request.c. > > Note that these are based on mempools, not the frame pools blkback > > would take. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.