[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]
> -----Original Message----- > From: Emmanuel Ackaouy [mailto:ack@xxxxxxxxxxxxx] > Sent: Wednesday, December 06, 2006 2:42 AM > To: Santos, Jose Renato G > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Turner, Yoshio; Jose > Renato Santos; G John Janakiraman > Subject: Re: [Xen-devel] [PATCH] Reduce overhead in > find_domain_by_id() [0/2] > > I also spotted find_domain_by_id() showing up rather high in > network intensive workloads. The CPU overhead of our network > I/O path is pretty large so it's worth trying to address and > if I remember, that one was oddly rather high on the list of > low hanging fruits. > > Find_domain_by_id() is called from __gnttab_map_grant_ref() > which is typically called N times on an array of grant ops > from gnttab_map_grant_ref(). Perhaps we could find a way to > optimize the common case here and only lookup and hold the > domain once per OP array instead of once per op in the multi op? > > We could also cleanup some code while there: > > if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) ) > { > vvvvvvvvvvvvvvvvvvvvvvvv > if ( rd != NULL ) > put_domain(rd); > ^^^^^^^^^^^^^^^^^^^^^^^^ WTF??? > DPRINTK("Could not find domain %d\n", op->dom); > op->status = GNTST_bad_domain; > return; > } > > It's a bit puzzling to me that grabbing the lock adds such an > overhead. Is this purely a lock operation overhead or is > there contention on the lock cache line (could find this out > by profiling for data cache line misses)? > Yes, this is due to cache contention on the lock. There is also cache contention on the domain refcnt used by get_domain(). I just implemented a percpu version of the reference count that avoids cache contention and the cost of find_domain_by_id() is reduced either further. Currently, find_domain_by_id() consumes approximately 3.05% of the total CPU cycles for a TCP TX micro benchmark. With the RCU scheme this is reduced to 1.16%. And with a per cpu reference count mechanism, this is reduced to 0.31%. I can submit a patch for the percpu reference count after I clean up the code a bit. Regards Renato > Cheers, > Emmanuel. > > On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G wrote: > > > > This is a set of patches to improve performance of > find_domain_by_id(). > > find_domain_by_id shows up high in profiles for network I/O > intensive > > workloads. > > Most of the cost for this function comes from 3 main functions (of > > aproximate equal costs): 1)read_lock(), 2)read_unlock() and > > 3)get_domain(). > > These patches replace the lock used for accessing domain_list and > > domain_hash with a lock free RCU scheme. Experiments > confirm that the > > cost of find_domain_by_id() is in fact reduced by 2/3. > > The patches apply cleanly to changeset 12732. > > > > Renato > > > > Patches: > > 1/2 - Import linux RCU code into Xen > > 2/2 - replace domlist_lock operations by RCU operations > > > > Signed-off-by: Jose Renato Santos <jsantos@xxxxxxxxxx> > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |