[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Odd blkdev throughput results



On Sun, 2008-03-09 at 20:07 +0000, Ian Pratt wrote:
> > The fun (for me, fun is probably a personal thing) part is that
> > throughput is higher than with TCP. May be due to the block layer being
> > much thinner than TCP/IP networking, or the fact that transfers utilize
> > the whole 4KB page size for sequential reads. Possibly some of both, I
> > didn't try.

> The big thing is that on network RX it is currently dom0 that does the copy. 
> In the CMP case this leaves the data in the shared cache ready to be accessed 
> by 
> the guest. In the SMP case it doesn't help at all. In netchannel2 we're
>  moving the copy to the guest CPU, and trying to eliminate it with smart 
> hardware.

> Block IO doesn't require a copy at all. 

Well, not in blkback by itself, but certainly from the in-memory disk
image. Unless I misunderstoode Keirs post recently, page flipping is
basically dead code, so I thought the number should at least point into
roughly the same directions.

> > This is not my question. What strikes me is that for the blkdev
> > interface, the CMP setup is 13% *slower* than SMP, at 661.99 MB/s.
> > 
> > Now, any ideas? I'm mildly familiar with both netback and blkback, and
> > I'd never expected something like that. Any hint appreciated.
> 
> How stable are your results with hdparm? I've never really trusted it as a 
> benchmarking tool.

So far, all the experiments I've done look fairly reasonable. Standard
deviance is low, and since I've been tracing netback reads I'm fairly
confident that the volume wasn't been left in domU memory somewhere.

I'm not so much interested in bio or physical disk performance, but
relative performance of how much can be squeezed through the buffer ring
before and after applying some changes. It's hardly a physical disk
benchmark, but it's simple and for the purpose given it seems okay.

> The ramdisk isn't going to be able to DMA data into the domU's buffer on
>  a read, so it will have to copy it. 

Right...

> The hdparm running in domU probably
>  doesn't actually look at any of the data it requests, so it stays local
>  to the dom0 CPU's cache (unlike a real app). 

hdparm performs sequential 2MB-read()s over a 3s period. It's not
calling the block layer directly or something. That'll certainly hit
domU caches?

> Doing all that copying
>  in dom0 is going to beat up the domU in the shared cache in the CMP
>  case, but won't effect it as much in the SMP case.

Well, I could live with blaming L2 footprint. Just wanted to hear if
someone has different explanations. And I would expect similar results
on net RX then, but I may be mistaken.

Furthermore, I need to apologize because I failed to use netperf
correctly and managed to report the TX path on my original post :P. The
real numbers are rather 885.43 (SMP) vs. 1295.46 (CMP), but the
difference compared to blk reads as such stays the same.

regards,
daniel

-- 
Daniel Stodden
LRR     -      Lehrstuhl fÃr Rechnertechnik und Rechnerorganisation
Institut fÃr Informatik der TU MÃnchen             D-85748 Garching
http://www.lrr.in.tum.de/~stodden         mailto:stodden@xxxxxxxxxx
PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.