Xen project Mailing List

On Wed, Sep 3, 2014 at 5:03 PM, Thomas Leonard <talex5@xxxxxxxxx> wrote:

On 2 September 2014 15:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> On 21 August 2014 18:30, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote:
>> Hi,
>>
>> On 21 Aug 2014, at 12:10, Thomas Leonard <talex5@xxxxxxxxx> wrote:
> [...]
>>> I've written up the profiling I've done so far here:
>>>
>>> http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>>>
>>> The graphs are quite interesting - if people familiar with the code
>>> could explain what's going on (especially with the block device) that
>>> would be great!
>>
>> The graphs are interesting!
>>
>> IIRC the grant unmap operation is very expensive since it involves a TLB shootdown. This adds a large amount (relatively, compared to modern flash-based disks) to the request/response latency. I think this is why the performance is so terrible with one request at a time. I suspect that the batching youâre seeing with two requests is an artefact of the backend, which is probably trying to unmap both grant references at once for efficiency. When I wrote a user-space block backend batching the unmaps made a massive difference.
>
> I wonder if this applies to ARM. You should be able to invalidate
> individual TLB entries there, I think.
>
>> There is a block protocol extension called âpersistent grantsâ that we havenât implemented (yet). This does the obvious thing and pre-shares a set of pages. We might suffer a bit because of extra copies (i.e. we might have to copy into the pre-shared pages) but we would save the unmap overhead, so it might be worth it.
>
> Had a go at this, but it didn't make much difference. However, I've
> discovered that dd in dom0 isn't too fast either. I originally tested
> with hdparm, which reports 20 MB/s as expected:
>
> $ hdparm -t /dev/mmcblk0
>Â Timing buffered disk reads:Â 62 MB inÂ 3.07 seconds =Â 20.21 MB/sec
>
> dd's speed seems to depend a lot on the block size. Using
> 4096*11=45056 bytes (which I assume is what dom0 would do in response
> to a guest request), I get 16.9 MB/s:
>
> $ dd iflag=direct if=/dev/vg0/benchÂ of=/dev/null bs=45056 count=1000
> 1000+0 records in
> 1000+0 records out
> 45056000 bytes (45 MB) copied, 2.65911 s, 16.9 MB/s
>
> bs=65536 gives 18.8 MB/s and bs=131072 gives 20.8 MB/s. Linux domU
> reports 20.36 MB/sec from hdparm but only 18.6 MB/s from dd
> (bs=131072). So perhaps Mirage is doing pretty well already.

I had a look at how hdparm gets the full speed. It's using 256 pages
per request, which requires support for indirect pages in blkfront
(with direct requests, the maximum is 11 pages per request).

I added support here:

Â https://github.com/talex5/mirage-block-xen/commits/master

With this, I got 21.32 MB/s read and 9.17 MB/s write. The results vary
a fair bit with block size (those were the best), but that seems like
an improvement (the previous best was 18.27r and 7.07w).

Nice! When you're happy with it, please make a pull request...

Cheers,

Dave Scott

Re: [MirageOS-devel] Profiling Mirage on Xen