On 21 August 2014 18:30, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
> Hi,
> On 21 Aug 2014, at 12:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:
>> I've written up the profiling I've done so far here:
>> http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>> The graphs are quite interesting - if people familiar with the code
>> could explain what's going on (especially with the block device) that
>> would be great!
> The graphs are interesting!
> IIRC the grant unmap operation is very expensive since it involves a TLB 
> shootdown. This adds a large amount (relatively, compared to modern 
> flash-based disks) to the request/response latency. I think this is why the 
> performance is so terrible with one request at a time. I suspect that the 
> batching youâre seeing with two requests is an artefact of the backend, which 
> is probably trying to unmap both grant references at once for efficiency. 
> When I wrote a user-space block backend batching the unmaps made a massive 
> difference.

I wonder if this applies to ARM. You should be able to invalidate
individual TLB entries there, I think.

> There is a block protocol extension called âpersistent grantsâ that we 
> havenât implemented (yet). This does the obvious thing and pre-shares a set 
> of pages. We might suffer a bit because of extra copies (i.e. we might have 
> to copy into the pre-shared pages) but we would save the unmap overhead, so 
> it might be worth it.

Had a go at this, but it didn't make much difference. However, I've
discovered that dd in dom0 isn't too fast either. I originally tested
with hdparm, which reports 20 MB/s as expected:

$ hdparm -t /dev/mmcblk0
 Timing buffered disk reads:  62 MB in  3.07 seconds =  20.21 MB/sec

dd's speed seems to depend a lot on the block size. Using
4096*11=45056 bytes (which I assume is what dom0 would do in response
to a guest request), I get 16.9 MB/s:

$ dd iflag=direct if=/dev/vg0/bench  of=/dev/null bs=45056 count=1000
1000+0 records in
1000+0 records out
45056000 bytes (45 MB) copied, 2.65911 s, 16.9 MB/s

bs=65536 gives 18.8 MB/s and bs=131072 gives 20.8 MB/s. Linux domU
reports 20.36 MB/sec from hdparm but only 18.6 MB/s from dd
(bs=131072). So perhaps Mirage is doing pretty well already.

>> Overall, the changes increased the queuing service's download rate
>> from 2.46 MB/s to 7.24 MB/s, which is nice but still a bit
>> disappointing.
> Certainly a good start!

Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

