[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] Persistent grant maps for xen blk drivers



> 
> On 19/10/12 03:34, James Harper wrote:
> >>
> >> This patch implements persistent grants for the xen-blk{front,back}
> >> mechanism. The effect of this change is to reduce the number of unmap
> >> operations performed, since they cause a (costly) TLB shootdown. This
> >> allows the I/O performance to scale better when a large number of VMs
> >> are performing I/O.
> >>
> >> Previously, the blkfront driver was supplied a bvec[] from the
> >> request queue. This was granted to dom0; dom0 performed the I/O and
> >> wrote directly into the grant-mapped memory and unmapped it; blkfront
> >> then removed foreign access for that grant. The cost of unmapping
> >> scales badly with the number of CPUs in Dom0. An experiment showed
> >> that when
> >> Dom0 has 24 VCPUs, and guests are performing parallel I/O to a
> >> ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests
> >> (at which point
> >> 650,000 IOPS are being performed in total). If more than 5 guests are
> >> used, the performance declines. By 10 guests, only
> >> 400,000 IOPS are being performed.
> >>
> >> This patch improves performance by only unmapping when the
> connection
> >> between blkfront and back is broken.
> >
> > I assume network drivers would suffer from the same affliction... Would a
> more general persistent map solution be worth considering (or be possible)?
> So a common interface to this persistent mapping allowing the persistent
> pool to be shared between all drivers in the DomU?
> 
> Yes, there are plans to implement the same for network drivers. I would
> generally avoid having a shared pool of grants for all the devices of a DomU,
> as said in the description of the patch:
> 
> Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black
> tree. As the grefs are not known apriori, and provide no guarantees on their
> ordering, we have to perform a search through this tree to find the page, for
> every gref we receive. This operation takes O(log n) time in the worst case.
> 
> Having a shared pool with all grants would mean that n will become much
> higher, and so the search time for a grant would increase.

I'm asking because I vaguely started a similar project a while back, but didn't 
get much further than investigating data structures. I had something like the 
following:

. redefined gref so that high bit indicates a persistent mapping (on the basis 
that no DomU is ever going to have >2^31 grants). High bit set indicates a 
persistent grant which is handled differently.

. New hypercall mem-op's to allocate/deallocate a persistent grant, returning a 
handle from Dom0 (with high bit set). Dom0 maintains a table of mapped grants 
with the handle being the index. Ref counting tracks usage so that an unmap 
won't be allowed when ref>0. I was taking the approach that a chunk of 
persistent grants would be allocated at boot time and so the actual map/unmap 
is not done often so the requirement of a hypercall wasn't a big deal. I hadn't 
figured out how to manage the size of this table yet.

. Mapping a gref with the high bit set in Dom0 becomes a lookup into the 
persistent table and a ref++ rather than an actual mapping operation. Unmapping 
becomes a ref--.

> Also, if the pool is
> shared some kind of concurrency control should be added, which will make it
> even slower.
> 

Yes, but I think I only needed to worry about that for the actual alloc/dealloc 
of the persistent map entry which would be an infrequent event. As I said, I 
never got much further than the above concept so I hadn't fully explored that - 
at the time I was chasing an imaginary problem with grant tables which turned 
out to be freelist contention in DomU.

James


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.