[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Reduce overhead in find_domain_by_id() [0/2]



 

> -----Original Message-----
> From: Emmanuel Ackaouy [mailto:ack@xxxxxxxxxxxxx] 
> Sent: Wednesday, December 06, 2006 2:42 AM
> To: Santos, Jose Renato G
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Turner, Yoshio; Jose 
> Renato Santos; G John Janakiraman
> Subject: Re: [Xen-devel] [PATCH] Reduce overhead in 
> find_domain_by_id() [0/2]
> 
> I also spotted find_domain_by_id() showing up rather high in 
> network intensive workloads. The CPU overhead of our network 
> I/O path is pretty large so it's worth trying to address and 
> if I remember, that one was oddly rather high on the list of 
> low hanging fruits.
> 
> Find_domain_by_id() is called from __gnttab_map_grant_ref() 
> which is typically called N times on an array of grant ops 
> from gnttab_map_grant_ref(). Perhaps we could find a way to 
> optimize the common case here and only lookup and hold the 
> domain once per OP array instead of once per op in the multi op?
> 
> We could also cleanup some code while there:
> 
>     if ( unlikely((rd = find_domain_by_id(op->dom)) == NULL) )
>     {
>      vvvvvvvvvvvvvvvvvvvvvvvv
>         if ( rd != NULL )
>             put_domain(rd);
>      ^^^^^^^^^^^^^^^^^^^^^^^^ WTF???
>         DPRINTK("Could not find domain %d\n", op->dom);
>         op->status = GNTST_bad_domain;
>         return;
>     }
> 
> It's a bit puzzling to me that grabbing the lock adds such an 
> overhead. Is this purely a lock operation overhead or is 
> there contention on the lock cache line (could find this out 
> by profiling for data cache line misses)?
> 

  Yes, this is due to cache contention on the lock. 
  There is also cache contention on the domain refcnt used by 
  get_domain().
  I just implemented a percpu version of the reference count 
  that avoids cache contention and the cost of 
  find_domain_by_id() is reduced either further.

  Currently, find_domain_by_id() consumes approximately 3.05% of 
  the total CPU cycles for a TCP TX micro benchmark. With the RCU
  scheme this is reduced to 1.16%. And with a per cpu reference 
  count mechanism, this is reduced to 0.31%. 
  I can submit a patch for the percpu reference count after
  I clean up the code a bit.

  Regards

  Renato
 
> Cheers,
> Emmanuel.
> 
> On Tue, Dec 05, 2006 at 07:35:37PM -0600, Santos, Jose Renato G wrote:
> > 
> > This is a set of patches to improve performance of 
> find_domain_by_id().
> > find_domain_by_id shows up high in profiles for network I/O 
> intensive 
> > workloads.
> > Most of the cost for this function comes from 3 main functions (of 
> > aproximate equal costs): 1)read_lock(), 2)read_unlock() and 
> > 3)get_domain().
> > These patches replace the lock used for accessing domain_list and 
> > domain_hash with a lock free RCU scheme. Experiments 
> confirm that the 
> > cost of find_domain_by_id() is in fact reduced by 2/3.
> > The patches apply cleanly to changeset 12732.
> > 
> > Renato
> > 
> > Patches:
> >   1/2 - Import linux RCU code into Xen
> >   2/2 - replace domlist_lock operations by RCU operations
> > 
> > Signed-off-by: Jose Renato Santos <jsantos@xxxxxxxxxx>
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.