[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Profiling Mirage on Xen

On 2 September 2014 15:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> On 21 August 2014 18:30, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
>> Hi,
>> On 21 Aug 2014, at 12:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> [...]
>>> I've written up the profiling I've done so far here:
>>> http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>>> The graphs are quite interesting - if people familiar with the code
>>> could explain what's going on (especially with the block device) that
>>> would be great!
>> The graphs are interesting!
>> IIRC the grant unmap operation is very expensive since it involves a TLB 
>> shootdown. This adds a large amount (relatively, compared to modern 
>> flash-based disks) to the request/response latency. I think this is why the 
>> performance is so terrible with one request at a time. I suspect that the 
>> batching youâre seeing with two requests is an artefact of the backend, 
>> which is probably trying to unmap both grant references at once for 
>> efficiency. When I wrote a user-space block backend batching the unmaps made 
>> a massive difference.
> I wonder if this applies to ARM. You should be able to invalidate
> individual TLB entries there, I think.
>> There is a block protocol extension called âpersistent grantsâ that we 
>> havenât implemented (yet). This does the obvious thing and pre-shares a set 
>> of pages. We might suffer a bit because of extra copies (i.e. we might have 
>> to copy into the pre-shared pages) but we would save the unmap overhead, so 
>> it might be worth it.
> Had a go at this, but it didn't make much difference. However, I've
> discovered that dd in dom0 isn't too fast either. I originally tested
> with hdparm, which reports 20 MB/s as expected:
> $ hdparm -t /dev/mmcblk0
>  Timing buffered disk reads:  62 MB in  3.07 seconds =  20.21 MB/sec
> dd's speed seems to depend a lot on the block size. Using
> 4096*11=45056 bytes (which I assume is what dom0 would do in response
> to a guest request), I get 16.9 MB/s:
> $ dd iflag=direct if=/dev/vg0/bench  of=/dev/null bs=45056 count=1000
> 1000+0 records in
> 1000+0 records out
> 45056000 bytes (45 MB) copied, 2.65911 s, 16.9 MB/s
> bs=65536 gives 18.8 MB/s and bs=131072 gives 20.8 MB/s. Linux domU
> reports 20.36 MB/sec from hdparm but only 18.6 MB/s from dd
> (bs=131072). So perhaps Mirage is doing pretty well already.

I had a look at how hdparm gets the full speed. It's using 256 pages
per request, which requires support for indirect pages in blkfront
(with direct requests, the maximum is 11 pages per request).

I added support here:


With this, I got 21.32 MB/s read and 9.17 MB/s write. The results vary
a fair bit with block size (those were the best), but that seems like
an improvement (the previous best was 18.27r and 7.07w).

Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.