[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.



On Tue, 16 Nov 2010, Dave Scott wrote:
> Hi,
> 
> Re: XCP's use of blktap2:
> 
> > On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote:
> > > On 11/12/2010 07:55 PM, Daniel Stodden wrote:
> > > > The second issue I see is the XCP side of things. XenServer got a
> > lot of
> > > > benefit out of blktap2, and particularly because of the tapdevs. It
> > > > promotes a fairly rigorous split between a blkback VBD, controlled
> > by
> > > > the agent, and tapdevs, controlled by XS's storage manager.
> > > >
> > > > That doesn't prevent blkback to go into userspace, but it better
> > won't
> > > > share a process with some libblktap, which in turn would better not
> > be
> > > > controlled under the same xenstore path.
> > >
> > >
> > > Could you elaborate on this?  What was the benefit?
> > 
> > It's been mainly a matter of who controls what. Blktap1 was basically a
> > VBD, controlled by the agent. Blktap2 is a VDI represented as a block
> > device. Leaving management of that to XCP's storage manager, which just
> > hands that device node over to Xapi simplified many things. Before, the
> > agent had to understand a lot about the type of storage, then talk to
> > the right backend accordingly. Worse, in order to have storage
> > management control a couple datapath features, you'd basically have to
> > talk to Xapi, which would talk though xenstore to blktap, which was a
> > bit tedious. :)
> 
> As Daniel says, XCP currently separates domain management (setting up, 
> rebooting VMs) from storage management (attaching disks, snapshot, coalesce). 
> In the current design the storage layer handles the storage control-path 
> (instigating snapshots, clones, coalesce, dedup in future) through a storage 
> API ("SMAPI") and provides a uniform interface to qemu, blkback for the 
> data-path (currently in the form of a dom0 block device). In a VM start, xapi 
> will first ask the storage control-path to make a disk available, and then 
> pass this information to blkback/qemu.
> 
> One of the trickiest things XCP handles is vhd "coalesce": merging a vhd file 
> into its "parent". This comes up because vhds are arranged in a tree 
> structure where the leaves are separate independent VM disks and the nodes 
> represent shared common blocks, the result of (eg) cloning a single VM lots 
> of times. When guest disks are deleted and the vhd leaves are removed, it 
> sometimes becomes possible to save space by merging nodes together. The 
> tricky bit is doing this while I/O is still being performed in parallel 
> against logically separate (but related by parentage/history) disks on 
> different hosts. It's necessary for the thing doing the coalescing to know 
> where all the I/O is going on (eg to be able to find the host and pid where 
> the related tapdisks (or qemus) live) and it's necessary for it to be able to 
> signal to these processes when they need to re-read the vhd tree metadata.
> 
> In the bad old blktap1 days, the storage control-path didn't know enough 
> about the data-path to reliably signal the active tapdisks: IIRC the tapdisks 
> were spawned by blktapctrl as a side-effect of the domain manager writing to 
> xenstore. In the much better blktap2 days :) the storage control-path sets up 
> (registers?) the data-path (currently via tap-ctl and a dom0 block device) 
> and so it knows who to talk to in order to co-ordinate a coalesce.
> 
> So I think the critical thing is to be able to have the storage control-path 
> able to do something to "register" a data-path, enabling it to find later and 
> signal any processes using that data-path. There are a bunch of different 
> possibilities the storage control-path could use instead of using tap-ctl to 
> create a block device, including:
> 

Qemu could be spawned directly (even before the VM) and QMP could
be use to communicate with it.
The qemu pid and/or the socket to issue QMP commands could be used as
identifiers.

> 
> I'm sure there are lots of possibilities :-)
 
Indeed.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.