Xen project Mailing List

Re: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.

To: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>

From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Date: Tue, 16 Nov 2010 18:40:49 -0800

Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 16 Nov 2010 18:41:36 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Tue, 2010-11-16 at 07:17 -0500, Stefano Stabellini wrote: > On Tue, 16 Nov 2010, Daniel Stodden wrote: > > Let's say we create an extension to tapdisk which speaks blkback's > > datapath in userland. We'd basically put one of those tapdisks on every > > storage node, independent of the image type, such as a bare LUN or a > > VHD. We add a couple additional IPC calls to make it directly > > connect/disconnect to/from (ring-ref,event-channel) pairs. > > > > Means it doesn't even need to talk xenstore, the control plane could all > > be left to some single daemon, which knows how to instruct the right > > tapdev (via libblktapctl) by looking at the physical-device node. I > > guess getting the control stuff out of the kernel is always a good idea. > > > > There are some important parts which would go missing. Such as > > ratelimiting gntdev accesses -- 200 thundering tapdisks each trying to > > gntmap 352 pages simultaneously isn't so good, so there still needs to > > be some bridge arbitrating them. I'd rather keep that in kernel space, > > okay to cram stuff like that into gntdev? It'd be much more > > straightforward than IPC. > > > > Also, I was absolutely certain I once saw VM_FOREIGN support in gntdev.. > > Can't find it now, what happened? Without, there's presently still no > > zero-copy. > > > > Once the issues were solved, it'd be kinda nice. Simplifies stuff like > > memshr for blktap, which depends on getting hold of original grefs. > > > > We'd presumably still need the tapdev nodes, for qemu, etc. But those > > can stay non-xen aware then. > > > > Considering that there is a blkback implementation in qemu already, why > don't use it? I don't certainly feel the need of yet another blkback > implementation. > A lot of people are working on qemu nowadays and this would let us > exploit some of that work and contribute to it ourselves. > We would only need to write a vhd block driver in qemu (even though a > "vdi" driver is already present, I assume is not actually compatible?) > and everything else is already there. > We could reuse their qcow and qcow2 drivers that honestly are better > maintained than ours (we receive a bug report per week about qcow/qcow2 > not working properly). > Finally qemu needs to be able to do I/O anyway because of the IDE > emulation, so it has to be in the picture in a way or another. One day > not far from now when we make virtio work on Xen, even the fast PV > data path might go through qemu, so we might as well optimize it. > After talking to the xapi guys to better understand their requirements, > I am pretty sure that the new upstream qemu with QMP support would be > able to satisfy them without issues. > Of all the possible solutions, this is certainly the one that requires > less lines of code and would allow us to reuse more resource that > otherwise would just remain untapped. > > I backported the upstream xen_disk implementation to qemu-xen > and run a test on the upstream 2.6.37rc1 kernel as dom0: VMs boot fine > and performances seem to be interesting. For the moment I am thinking > about enabling the qemu blkback implementation as a fallback in case > blktap2 is not present in the system (ie: 2.6.37 kernels). I'm not against reducing code and effort. But in order to switch to a different base we would need a drop-in match for VHD and at least a good match for all the control machinery on which xen-sm presently depends. There's also a lot of investment in filter drivers etc. Then there is SM control, stuff like pause/unpause to get guests off the storage nodes for snapshot/coalesce, more recently calls for statistics and monitoring, tweaking some physical I/O details, etc. Used to be a bitch, nowadays it's somewhat simpler, but that's all stuff we completely depend on. Moving blkback out of kernel space, into tapdisk, is predictable in size and complexity. Replacing tapdisks altogether would be quite a different story. The remainder below isn't fully qualified, just random bits coming to my mind, assuming you're not talking about sharing code/libs and frameworks, but actual processes. 1st, what's the rationale with fully PV'd guests on xen? (That argument might not count if just taking qemu as the container process and stripping emulation for those.) Related, there's the question of memory footprint. Kernel blkback is extremely lightweight. Moving the datapath into userland can create headaches, especially on 32bit dom0s with lot of guests and disks on backends which used to be bare LUNs under blkback. That's a problem tapdisk has to face too, just wondering about the size of the issue in qemu. Related, Xapi depends a lot on dom0 plugs, where the datapath can be somewhat hairy when it comes to blocking I/O and resource allocation. Then there is sharing. Storage activation normally doesn't operate in a specific VM context. It presently doesn't even relate to a particular VBD, much less a VM. For qemu alone, putting storage virtualization into the same address space is an obvious choice. For Xen, enforcing that sounds like a step backward. >From the shared-framework perspective, and the amount of code involved: The ring path alone is too small to consider, and the more difficult parts on top of that like state machines for write ordering and syncing etc are hard to share because the depend on the queue implementation and image driver interface. Control might be a different story. As far as frontend/backend IPC via xenstore goes, right now I still feel like those backends could be managed by a single daemon, similar to what blktapctrl did (let's just make it stateless/restartable this time). I guess qemu processes run their xenstore trees already fine, but internally? Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.