[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [MirageOS-devel] Profiling Mirage on Xen
Hi, On 21 Aug 2014, at 12:10, Thomas Leonard <talex5@xxxxxxxxx> wrote: > On 11 August 2014 14:04, David Scott <scott.dj@xxxxxxxxx> wrote: >> >> On Mon, Aug 11, 2014 at 12:32 PM, Thomas Leonard <talex5@xxxxxxxxx> wrote: >>> >>> Has anyone made any tools for profiling on Xen? >> >> >> The closest thing I can think of is xentrace/xenalyse: >> >> http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/ >> >> This will tell you what Xen can see: eg how the vCPUs map to pCPUs and when >> they block on events etc. > > Thanks. For now, I've pinned dom0 to CPU 0 and my unikernel to CPU 1. > >>> I want to see why my network service on Xen/ARM only gets about 5 MB/s >>> (while a Linux guest gets about 45 MB/s). >>> >>> I tried compiling with profiling on ("true: profile" in the _tags) and >>> wrote a __gnu_mcount_nc function to dump all the results to a buffer >>> [1], plus a script to turn the addresses back into symbols and guess >>> the nesting (a bit unreliable, as it doesn't tell you when the >>> function finishes). >>> >>> Here's an example CSV of the output (the unikernel waits for a TCP >>> connection and then streams data to it as fast as it can): >>> >>> http://test.roscidus.com/static/sample-output.csv.bz2 >>> >>> I haven't checked it carefully to see if it's correct - this is just >>> an example of the kind of output. It shows the call graph and the >>> (cumulative) time spent in each function. Since it doesn't know the >>> end times, it assumes a function runs until one of its parents calls >>> something else. >> >> >> Cool -- it would be great to polish up a tool like this. > > Yes, I think there's a lot we could do here. I've written up the > profiling I've done so far here: > > http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/ > > The graphs are quite interesting - if people familiar with the code > could explain what's going on (especially with the block device) that > would be great! The graphs are interesting! IIRC the grant unmap operation is very expensive since it involves a TLB shootdown. This adds a large amount (relatively, compared to modern flash-based disks) to the request/response latency. I think this is why the performance is so terrible with one request at a time. I suspect that the batching you’re seeing with two requests is an artefact of the backend, which is probably trying to unmap both grant references at once for efficiency. When I wrote a user-space block backend batching the unmaps made a massive difference. There is a block protocol extension called “persistent grants” that we haven’t implemented (yet). This does the obvious thing and pre-shares a set of pages. We might suffer a bit because of extra copies (i.e. we might have to copy into the pre-shared pages) but we would save the unmap overhead, so it might be worth it. > Overall, the changes increased the queuing service's download rate > from 2.46 MB/s to 7.24 MB/s, which is nice but still a bit > disappointing. Certainly a good start! Cheers, Dave > > > -- > Dr Thomas Leonard http://0install.net/ > GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 > GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA > > _______________________________________________ > MirageOS-devel mailing list > MirageOS-devel@xxxxxxxxxxxxxxxxxxxx > http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |