[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Odd blkdev throughput results
On Sun, 2008-03-09 at 20:07 +0000, Ian Pratt wrote: > > The fun (for me, fun is probably a personal thing) part is that > > throughput is higher than with TCP. May be due to the block layer being > > much thinner than TCP/IP networking, or the fact that transfers utilize > > the whole 4KB page size for sequential reads. Possibly some of both, I > > didn't try. > The big thing is that on network RX it is currently dom0 that does the copy. > In the CMP case this leaves the data in the shared cache ready to be accessed > by > the guest. In the SMP case it doesn't help at all. In netchannel2 we're > moving the copy to the guest CPU, and trying to eliminate it with smart > hardware. > Block IO doesn't require a copy at all. Well, not in blkback by itself, but certainly from the in-memory disk image. Unless I misunderstoode Keirs post recently, page flipping is basically dead code, so I thought the number should at least point into roughly the same directions. > > This is not my question. What strikes me is that for the blkdev > > interface, the CMP setup is 13% *slower* than SMP, at 661.99 MB/s. > > > > Now, any ideas? I'm mildly familiar with both netback and blkback, and > > I'd never expected something like that. Any hint appreciated. > > How stable are your results with hdparm? I've never really trusted it as a > benchmarking tool. So far, all the experiments I've done look fairly reasonable. Standard deviance is low, and since I've been tracing netback reads I'm fairly confident that the volume wasn't been left in domU memory somewhere. I'm not so much interested in bio or physical disk performance, but relative performance of how much can be squeezed through the buffer ring before and after applying some changes. It's hardly a physical disk benchmark, but it's simple and for the purpose given it seems okay. > The ramdisk isn't going to be able to DMA data into the domU's buffer on > a read, so it will have to copy it. Right... > The hdparm running in domU probably > doesn't actually look at any of the data it requests, so it stays local > to the dom0 CPU's cache (unlike a real app). hdparm performs sequential 2MB-read()s over a 3s period. It's not calling the block layer directly or something. That'll certainly hit domU caches? > Doing all that copying > in dom0 is going to beat up the domU in the shared cache in the CMP > case, but won't effect it as much in the SMP case. Well, I could live with blaming L2 footprint. Just wanted to hear if someone has different explanations. And I would expect similar results on net RX then, but I may be mistaken. Furthermore, I need to apologize because I failed to use netperf correctly and managed to report the TX path on my original post :P. The real numbers are rather 885.43 (SMP) vs. 1295.46 (CMP), but the difference compared to blk reads as such stays the same. regards, daniel -- Daniel Stodden LRR - Lehrstuhl fÃr Rechnertechnik und Rechnerorganisation Institut fÃr Informatik der TU MÃnchen D-85748 Garching http://www.lrr.in.tum.de/~stodden mailto:stodden@xxxxxxxxxx PGP Fingerprint: F5A4 1575 4C56 E26A 0B33 3D80 457E 82AE B0D8 735B _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |