On 27 Jan 2014, at 10:38, David Scott <scott.dj@xxxxxxxxx> wrote:
On Monday, January 27, 2014, Anil Madhavapeddy < anil@xxxxxxxxxx> wrote: The BLOCK interface is for IO requests, so reordering in a library makes sense (analogous to elevator algorithms in a conventional stack).
On the Xen ring interface, do we do enough to expose the parallelism? Requests that are written can be acked out of order in the protocol, and it's not clear to me what the block backend actually does. When I was writing it, I seem to remember that responses always seem to come back in the order they were written, which doesn't seem very parallel at all.
Good question.
The userspace grant table interface exposes a "map batch" interface and a corresponding "unmap batch", to amortize the costs of the syscall and potentially the TLB flush (although I bet the user to kernel syscall dwarfs the TLB flush for userspace) At the moment the blkback uses this to the max by mapping the whole ring at once. Unfortunately the requests can't be acknowledged until after the unmap because this confuses the frontend. As a result we pull the requests as a batch and can parallelise within the batch (code currently missing) and they all complete at once. For trivial backends (like a /dev/null) it gets a high throughput. It should improve with indirect descriptors since we'll get a larger batch size.
This all makes sense.
The Ring/Lwt should just work at the moment if requests come in out of order (since there is a id->wakener mapping), and bind will not be a barrier in this case should you do a Lwt_list.map_p.
Yeah.
Btw do you happen to know how POSIX aio works? Does it open the file 'n' times to be able to 'seek' and 'write' from parallel threads? Perhaps mirage-block-unix should implement a pool of open file descriptors per device.
Note that POSIX AIO and the Linux libaio are different. POSIX aio basically uses userspace threads to simulate blocking I/O, and is probably not that different from what Lwt_unix does (via Lwt_preemptive). I'm not sure there would be a lot of benefit in going down this route.
The kernel aio (via io_submit) is the one that is scheduled directly within the kernel and (last I checked) requires O_DIRECT.
-anil |