[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Profiling Mirage on Xen


On 21 Aug 2014, at 12:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:

> On 11 August 2014 14:04, David Scott <scott.dj@xxxxxxxxx> wrote:
>> On Mon, Aug 11, 2014 at 12:32 PM, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>>> Has anyone made any tools for profiling on Xen?
>> The closest thing I can think of is xentrace/xenalyse:
>> http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/
>> This will tell you what Xen can see: eg how the vCPUs map to pCPUs and when
>> they block on events etc.
> Thanks. For now, I've pinned dom0 to CPU 0 and my unikernel to CPU 1.
>>> I want to see why my network service on Xen/ARM only gets about 5 MB/s
>>> (while a Linux guest gets about 45 MB/s).
>>> I tried compiling with profiling on ("true: profile" in the _tags) and
>>> wrote a __gnu_mcount_nc function to dump all the results to a buffer
>>> [1], plus a script to turn the addresses back into symbols and guess
>>> the nesting (a bit unreliable, as it doesn't tell you when the
>>> function finishes).
>>> Here's an example CSV of the output (the unikernel waits for a TCP
>>> connection and then streams data to it as fast as it can):
>>>  http://test.roscidus.com/static/sample-output.csv.bz2
>>> I haven't checked it carefully to see if it's correct - this is just
>>> an example of the kind of output. It shows the call graph and the
>>> (cumulative) time spent in each function. Since it doesn't know the
>>> end times, it assumes a function runs until one of its parents calls
>>> something else.
>> Cool -- it would be great to polish up a tool like this.
> Yes, I think there's a lot we could do here. I've written up the
> profiling I've done so far here:
> http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
> The graphs are quite interesting - if people familiar with the code
> could explain what's going on (especially with the block device) that
> would be great!

The graphs are interesting!

IIRC the grant unmap operation is very expensive since it involves a TLB 
shootdown. This adds a large amount (relatively, compared to modern flash-based 
disks) to the request/response latency. I think this is why the performance is 
so terrible with one request at a time. I suspect that the batching you’re 
seeing with two requests is an artefact of the backend, which is probably 
trying to unmap both grant references at once for efficiency. When I wrote a 
user-space block backend batching the unmaps made a massive difference.

There is a block protocol extension called “persistent grants” that we haven’t 
implemented (yet). This does the obvious thing and pre-shares a set of pages. 
We might suffer a bit because of extra copies (i.e. we might have to copy into 
the pre-shared pages) but we would save the unmap overhead, so it might be 
worth it.

> Overall, the changes increased the queuing service's download rate
> from 2.46 MB/s to 7.24 MB/s, which is nice but still a bit
> disappointing.

Certainly a good start!


> -- 
> Dr Thomas Leonard        http://0install.net/
> GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
> GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA
> _______________________________________________
> MirageOS-devel mailing list
> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.