[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen-unstable] blktap2: README updates

# HG changeset patch
# User Keir Fraser <keir.fraser@xxxxxxxxxx>
# Date 1244190683 -3600
# Node ID 9f4c5734e4aa014e062ddfdf7ed3763790b8e619
# Parent  21d1fcb0be4179a7122cbab1fd3aa4adff1c338a
blktap2: README updates

As promised, this brings the long out-of-sync documentation up to
date, and adds some getting started information about tapdisk driver
development - I get the occasional email on this latter subject.

Signed-off-by: Dutch Meyer <dmeyer@xxxxxxxxx>
 tools/blktap2/README |  325 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 262 insertions(+), 63 deletions(-)

diff -r 21d1fcb0be41 -r 9f4c5734e4aa tools/blktap2/README
--- a/tools/blktap2/README      Fri Jun 05 09:30:36 2009 +0100
+++ b/tools/blktap2/README      Fri Jun 05 09:31:23 2009 +0100
@@ -1,19 +1,21 @@ Blktap Userspace Tools + Library
-Blktap Userspace Tools + Library
+Blktap2 Userspace Tools + Library
+Dutch Meyer
+4th June 2009
 Andrew Warfield and Julian Chesterfield
 16th June 2006
-The blktap userspace toolkit provides a user-level disk I/O
-interface. The blktap mechanism involves a kernel driver that acts
+The blktap2 userspace toolkit provides a user-level disk I/O
+interface. The blktap2 mechanism involves a kernel driver that acts
 similarly to the existing Xen/Linux blkback driver, and a set of
-associated user-level libraries.  Using these tools, blktap allows
+associated user-level libraries.  Using these tools, blktap2 allows
 virtual block devices presented to VMs to be implemented in userspace
 and to be backed by raw partitions, files, network, etc.
-The key benefit of blktap is that it makes it easy and fast to write
+The key benefit of blktap2 is that it makes it easy and fast to write
 arbitrary block backends, and that these user-level backends actually
 perform very well.  Specifically:
@@ -38,7 +40,7 @@ perform very well.  Specifically:
 How it works (in one paragraph):
-Working in conjunction with the kernel blktap driver, all disk I/O
+Working in conjunction with the kernel blktap2 driver, all disk I/O
 requests from VMs are passed to the userspace deamon (using a shared
 memory interface) through a character device. Each active disk is
 mapped to an individual device node, allowing per-disk processes to
@@ -49,74 +51,271 @@ code.  We provide a simple, asynchronous
 code.  We provide a simple, asynchronous virtual disk interface that
 makes it quite easy to add new disk implementations.
-As of June 2006 the current supported disk formats are:
+As of June 2009 the current supported disk formats are:
  - Raw Images (both on partitions and in image files)
- - File-backed Qcow disks
- - Standalone sparse Qcow disks
- - Fast shareable RAM disk between VMs (requires some form of cluster-based 
-   filesystem support e.g. OCFS2 in the guest kernel)
- - Some VMDK images - your mileage may vary
-Raw and QCow images have asynchronous backends and so should perform
-fairly well.  VMDK is based directly on the qemu vmdk driver, which is
-synchronous (a.k.a. slow).
+ - Fast sharable RAM disk between VMs (requires some form of 
+   cluster-based filesystem support e.g. OCFS2 in the guest kernel)
+ - VHD, including snapshots and sparse images
+ - Qcow, including snapshots and sparse images
 Build and Installation Instructions
-Make to configure the blktap backend driver in your dom0 kernel.  It
-will cooperate fine with the existing backend driver, so you can
-experiment with tap disks without breaking existing VM configs.
-To build the tools separately, "make && make install" in 
+Make to configure the blktap2 backend driver in your dom0 kernel.  It
+will inter-operate with the existing backend and frontend drivers.  It
+will also cohabitate with the original blktap driver.  However, some
+formats (currently aio and qcow) will default to their blktap2
+versions when specified in a vm configuration file.
+To build the tools separately, "make && make install" in
 Using the Tools
-Prepare the image for booting. For qcow files use the qcow utilities
-installed earlier. e.g. qcow-create generates a blank standalone image
-or a file-backed CoW image. img2qcow takes an existing image or
-partition and creates a sparse, standalone qcow-based file.
+Preparing an image for boot:
 The userspace disk agent is configured to start automatically via xend
-(alternatively you can start it manually => 'blktapctrl')
-Customise the VM config file to use the 'tap' handler, followed by the
-driver type. e.g. for a raw image such as a file or partition:
-disk = ['tap:aio:<FILENAME>,sda1,w']
-e.g. for a qcow image:
-disk = ['tap:qcow:<FILENAME>,sda1,w']
-Mounting images in Dom0 using the blktap driver
+Customize the VM config file to use the 'tap:tapdisk' handler,
+followed by the driver type. e.g. for a raw image such as a file or
+disk = ['tap:tapdisk:aio:<FILENAME>,sda1,w']
+Alternatively, the vhd-util tool (installed with make install, or in
+/blktap2/vhd) can be used to build sparse copy-on-write vhd images.
+For example, to build a sparse image -
+  vhd-util create -n MyVHDFile -s 1024
+This creates a sparse 1GB file named "MyVHDFile" that can be mounted
+and populated with data.
+One can also base the image on a raw file -
+  vhd-util snapshot -n MyVHDFile -p SomeRawFile -m
+This creates a sparse VHD file named "MyVHDFile" using "SomeRawFile"
+as a parent image.  Copy-on-write semantics ensure that writes will be
+stored in "MyVHDFile" while reads will be directed to the most
+recently written version of the data, either in "MyVHDFile" or
+"SomeRawFile" as is appropriate.  Other options exist as well, consult
+the vhd-util application for the complete set of VHD tools.
+VHD files can be mounted automatically in a guest similarly to the
+above AIO example simply by specifying the vhd driver.
+disk = ['tap:tapdisk:vhd:<VHD FILENAME>,sda1,w']
+Pausing a guest will also plug the corresponding IO queue for blktap2
+devices and stop blktap2 drivers.  This can be used to implement a
+safe live snapshot of qcow and vhd disks.  An example script "xmsnap"
+is shown in the tools/blktap2/drivers directory.  This script will
+perform a live snapshot of a qcow disk.  VHD files can use the
+"vhd-util snapshot" tool discussed above.  If this snapshot command is
+applied to a raw file mounted with tap:tapdisk:AIO, include the -m
+flag and the driver will be reloaded as VHD.  If applied to an already
+mounted VHD file, omit the -m flag.
+Mounting images in Dom0 using the blktap2 driver
 Tap (and blkback) disks are also mountable in Dom0 without requiring an
-active VM to attach. You will need to build a xenlinux Dom0 kernel that
-includes the blkfront driver (e.g. the default 'make world' or 
-'make kernels' build. Simply use the xm command-line tool to activate
-the backend disks, and blkfront will generate a virtual block device that
-can be accessed in the same way as a loop device or partition:
-e.g. for a raw image file <FILENAME> that would normally be mounted using
-the loopback driver (such as 'mount -o loop <FILENAME> /mnt/disk'), do the
-xm block-attach 0 tap:aio:<FILENAME> /dev/xvda1 w 0
-mount /dev/xvda1 /mnt/disk        <--- don't use loop driver
-In this way, you can use any of the userspace device-type drivers built
-with the blktap userspace toolkit to open and mount disks such as qcow
-or vmdk images:
-xm block-attach 0 tap:qcow:<FILENAME> /dev/xvda1 w 0
-mount /dev/xvda1 /mnt/disk
+active VM to attach. 
+The syntax is -
+  tapdisk2 -n <type>:<full path to file>
+For example -
+  tapdisk2  -n aio:/home/images/rawFile.img
+When successful the location of the new device will be provided by
+tapdisk2 to stdout and tapdisk2 will terminate.  From that point
+forward control of the device is provided through sysfs in the
+  /sys/class/blktap2/blktap#/
+Where # is a blktap2 device number present in the path that tapdisk2
+printed before terminating.  The sysfs interface is largely intuitive,
+for example, to remove tap device 0 one would-
+  echo 1 > /sys/class/blktap2/blktap0/remove
+Similarly, a pause control is available, which is can be used to plug
+the request queue of a live running guest.
+Previous versions of blktap mounted devices in dom0 by using blkfront
+in dom0 and the xm block-attach command.  This approach is still
+available, though slightly more cumbersome.
+Tapdisk Development
+People regularly ask how to develop their own tapdisk drivers, and
+while it has not yet been well documented, the process is relatively
+easy.  Here I will provide a brief overview.  The best reference, of
+course, comes from the existing drivers.  Specifically,
+blktap2/drivers/block-ram.c and blktap2/drivers/block-aio.c provide
+the clearest examples of simple drivers.
+First you need to register your new driver with blktap. This is done
+in disktypes.h.  There are five things that you must do.  To
+demonstrate, I will create a disk called "mynewdisk", you can name
+yours freely.
+1) Forward declare an instance of struct tap_disk.
+e.g. -  
+  extern struct tap_disk tapdisk_mynewdisk;
+2) Claim one of the unused disk type numbers, take care to observe the
+MAX_DISK_TYPES macro, increasing the number if necessary.
+e.g. -
+  #define DISK_TYPE_MYNEWDISK         10
+3) Create an instance of disk_info_t.  The bulk of this file contains examples 
of these.
+e.g. -
+  static disk_info_t mynewdisk_disk = {
+          "My New Disk (mynewdisk)",
+          "mynewdisk",
+          0,
+  #ifdef TAPDISK
+          &tapdisk_mynewdisk,
+  #endif
+  };
+A few words about what these mean.  The first field must be the disk
+type number you claimed in step (2).  The second field is a string
+describing your disk, and may contain any relevant info.  The third
+field is the name of your disk as will be used by the tapdisk2 utility
+and xend (for example tapdisk2 -n mynewdisk:/path/to/disk.image, or in
+your xm create config file).  The forth is binary and determines
+whether you will have one instance of your driver, or many.  Here, a 1
+means that your driver is a singleton and will coordinate access to
+any number of tap devices.  0 is more common, meaning that you will
+have one driver for each device that is created.  The final field
+should contain a reference to the struct tap_disk you created in step
+4) Add a reference to your disk info structure (from step (3)) to the
+dtypes array.  Take care here - you need to place it in the position
+corresponding to the device type number you claimed in step (2).  So
+we would place &mynewdisk_disk in dtypes[10].  Look at the other
+devices in this array and pad with "&null_disk," as necessary.
+5) Modify the xend python scripts.  You need to add your disk name to
+the list of disks that xend recognizes.
+  tools/python/xen/xend/server/BlktapController.py
+And add your disk to the "blktap_disk_types" array near the top of
+your file.  Use the same name you specified in the third field of step
+(3).  The order of this list is not important.
+Now your driver is ready to be written.  Create a block-mynewdisk.c in
+tools/blktap2/drivers and add it to the Makefile.
+Copying block-aio.c and block-ram.c would be a good place to start.
+Read those files as you go through this, I will be assisting by
+commenting on a few useful functions and structures.
+struct tap_disk:
+Remember the forward declaration in step (1) of the setup phase above?
+Now is the time to make that structure a reality.  This structure
+contains a list of function pointers for all the routines that will be
+asked of your driver.  Currently the required functions are open,
+close, read, write, get_parent_id, validate_parent, and debug.
+e.g. -
+  struct tap_disk tapdisk_mynewdisk = {
+          .disk_type          = "tapdisk_mynewdisk",
+          .flags              = 0,
+          .private_data_size  = sizeof(struct tdmynewdisk_state),
+          .td_open            = tdmynewdisk_open,
+                 ....
+The private_data_size field is used to provide a structure to store
+the state of your device.  It is very likely that you will want
+something here, but you are free to design whatever structure you
+want.  Blktap will allocate this space for you, you just need to tell
+it how much space you want.
+This is the open routine.  The first argument is a structure
+representing your driver.  Two fields in this array are
+driver->data will contain a block of memory of the size your requested
+in in the .private_data_size field of your struct tap_disk (above).
+driver->info contains a structure that details information about your
+disk.  You need to fill this out.  By convention this is done with a
+_get_image_info() function.  Assign a size (the total number of
+sectors), sector_size (the size of each sector in bytes, and set
+driver->info->info to 0.
+The second parameter contains the name that was specified in the
+creation of your device, either through xend, or on the command line
+with tapdisk2.  Usually this specifies a file that you will open in
+this routine.  The final parameter, flags, contains one of a number of
+flags specified in tapdisk.h that may change the way you treat the
+These are your read and write operations.  What you do here will
+depend on your disk, but you should do exactly one of- 
+1) call td_complete_request with either error or success code.
+2) Call td_forward_request, which will forward the request to the next
+driver in the stack.
+3) Queue the request for asynchronous processing with
+td_prep_read/write.  In doing so, you will also register a callback
+for request completion.  When the request completes you must do one of
+options (1) or (2) above.  Finally, call td_queue_tiocb to submit the
+request to a wait queue.
+The above functions are defined in tapdisk-interface.c.  If you don't
+use them as specified you will run into problems as your driver will
+fail to inform blktap of the state of requests that have been
+submitted.  Blktap keeps track of all requests and does not like losing track.
+_close, _get_parent_id, _validate_parent:
+These last few tend to be very routine.  _close is called when the
+device is closed, and also when it is paused (in this case, open will
+also be called later).  The other functions are used in stacking
+drivers.  Most often drivers will return TD_NO_PARENT and -EINVAL,

Xen-changelog mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.