[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] rough thoughts on mirage block servers





On Monday, January 27, 2014, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
The BLOCK interface is for IO requests, so reordering in a library makes sense (analogous to elevator algorithms in a conventional stack).

On the Xen ring interface, do we do enough to expose the parallelism?  Requests that are written can be acked out of order in the protocol, and it's not clear to me what the block backend actually does.  When I was writing it, I seem to remember that responses always seem to come back in the order they were written, which doesn't seem very parallel at all.

Good question.

The userspace grant table interface exposes a "map batch" interface and a corresponding "unmap batch", to amortize the costs of the syscall and potentially the TLB flush (although I bet the user to kernel syscall dwarfs the TLB flush for userspace) At the moment the blkback uses this to the max by mapping the whole ring at once. Unfortunately the requests can't be acknowledged until after the unmap because this confuses the frontend. As a result we pull the requests as a batch and can parallelise within the batch (code currently missing) and they all complete at once. For trivial backends (like a /dev/null) it gets a high throughput. It should improve with indirect descriptors since we'll get a larger batch size.


The Ring/Lwt should just work at the moment if requests come in out of order (since there is a id->wakener mapping), and bind will not be a barrier in this case should you do a Lwt_list.map_p.

Yeah.

Btw do you happen to know how POSIX aio works? Does it open the file 'n' times to be able to 'seek' and 'write' from parallel threads? Perhaps mirage-block-unix should implement a pool of open file descriptors per device.

Dave

-anil

On 27 Jan 2014, at 09:36, David Scott <scott.dj@xxxxxxxxx> wrote:

> Hi,
>
> I did a bit of work tidying up the "xen-disk" userspace app which allows you to attach a synthetic block device to a xen guest. In particular I functorised it over the V1_LWT.BLOCK interface, so any other mirage block device can be used as the backing store for the synthetic device. I tested this by implementing V1_LWT.BLOCK in the ocaml-vhd library, which allows the synthetic device to be backed by a vhd-format file on the xen host. So far so good!
>
> While doing a bit of performance optimisation, I hit a bit of a snag. Currently a mirage app using a block device is encouraged to issue BLOCK requests in parallel. For example, a filesystem would probably parallel write all data blocks, Lwt.bind, and then reset a metadata pointer to make the new data live in a final update; in effect the Lwt.bind acts like a 'barrier', forbidding I/O re-ordering across it. In the "xen-disk" app I receive queues of requests from the VM and then I issued them serially -- unsurprisingly the performance is poor. I think I need to create a library which can operate on these queues of requests, identify conficts (reads following writes), and parallelises them as much as possible.
>
> The mirage-block-unix implementation is also pretty terrible, since it serialises everything again. We should probably open the file from parellel threads, or switch to some library like aio. I don't know whether the request-paralleliser should know about any kind of maximum queue depth from the server-side, or whether it should just take all the I/O it can get.
>
> Ideas or suggestions welcome! Sorry the above was a bit more of a stream-of-consciousness than a coherent picture :-)
>
> Cheers,
> Dave
> _______________________________________________
> MirageOS-devel mailing list
> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel



--
Dave Scott
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.