Xen project Mailing List

Re: [Xen-devel] blkback global resources

To: "Daniel Stodden" <daniel.stodden@xxxxxxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 27 Mar 2012 08:27:05 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxx>, Andrei Lifchits <andrei.lifchits@xxxxxxxxxx>

Delivery-date: Tue, 27 Mar 2012 07:27:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 26.03.12 at 18:53, Daniel Stodden <daniel.stodden@xxxxxxxxxxxxxx> wrote: > On Mon, 2012-03-26 at 17:06 +0100, Keir Fraser wrote: >> Cc'ing Daniel for you on this one, Jan. >> >> K. >> >> On 26/03/2012 16:56, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >> >> > All the resources allocated based on xen_blkif_reqs are global in >> > blkback. While (without having measured anything) I think that this >> > is bad from a QoS perspective (not the least implied from a warning >> > issued by Citrix'es multi-page-ring patches: >> > >> > if (blkif_reqs < BLK_RING_SIZE(order)) >> > printk(KERN_WARNING "WARNING: " >> > "I/O request space (%d reqs) < ring order %ld, " >> > "consider increasing %s.reqs to >= %ld.", >> > blkif_reqs, order, KBUILD_MODNAME, >> > roundup_pow_of_two(BLK_RING_SIZE(order))); >> > >> > indicating that this _is_ a bottleneck), I'm otoh hesitant to convert >> > this to per-instance allocations, as the amount of memory taken >> > away from Dom0 for this may be not insignificant when there are >> > many devices. >> > >> > Does anyone have an opinion here, in particular regarding the >> > original authors' decision to make this global vs. the apparently >> > made observation (by Daniel Stodden, the author of said patch, >> > who I don't have any current email of to ask directly), but also >> > in the context of multi-page rings, the purpose of which is to >> > allow for larger amounts of in-flight I/O? >> > >> > Thanks, Jan > > Re-CC'ing Andrei Lifchits, I think there's been some work going on at > Citrix regarding that matter. > > Yes, just allocating a pfn pool per backend instance is way too much > memory balooned out. Otherwise this stuff would have never looked the > way it does now. This of course could be accounted for by having an initially non-empty (large enough) balloon (not sure how easy it is these days to do this for pv-ops, but it has always been trivial with the legacy code). That wouldn't help a 32-bit kernel much (where generally the initial balloon is all in highmem, yet the vacated pages need to be in lowmem), but for 64-bit kernels it should be fine. > Regarding the right balance, note that on the other extreme end, if PFN > space were infinite, there's not much expected performance gain from > rendering virtual backends fully independent. Beyond controller queue > depth, these requests are all just going to pile up, waiting. Is there a way to look through the queue stack to find out how many distinct ones there are that the backend is running on top of as well as - for a particular I/O path - the one with the smallest depth? Or can one assume that the top most one (generally loop's or blktap2's) won't advertise a queue deeper than what is going to be accepted downstream (probably not, I'd guess)? And - what you say would similarly apply to the usefulness of multi-page rings afaict. > XenServer has some support for decoupling in blktap.ko [1] which worked > relatively well: Use frame 'pool' kobjects. A bunch of pages, mapped to > sysfs object. Name was arbitrary. Size configurable, even at runtime. > > Sysfs meant stuff was easily set up by shell or python code, or > manually. To become operational, every backend must be bound to a pool > (initially, the global 'default' one, for tool compat). Backends can be > relinked arbitrarily before entering Connected state. > > Then let the userland toolstack set things up according to physical I/O > topology and properties probed. Basically every physical backend (say, a > volume group, or a HBA) would start out by allocating and dimensioning a > dedicated pool (named after the backend), and every backend instance > fired up gets bound to the pool it belongs to. Having userland do all that seems like a fallback solution only to me - I would hope that sufficient information is available directly to the drivers. Thanks in any case for responding so quickly, Jan > There's a lot of additional optimizations one could consider, e.g. > autogrowing the pool (log(nbackends) or so?) and the like. To improve > locality, having backends which look ahead in their request queue and > allocate whole batches is probably a good idea too, etc, etc. > > HTH, > Daniel > > [1] > http://xenbits.xen.org/gitweb/?p=people/dstodden/linux.git > mostly in drivers/block/blktap/sysfs.c (show/store_pool) and request.c. > Note that these are based on mempools, not the frame pools blkback > would take. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.