[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MirageOS-devel] multiple block devices under xen



In last week's call I promised to email some notes on creating Mirage apps with multiple block devices under xen.

Background
----------

When a disk is added to a VM (under Xen, KVM, any hypervisor) it's like plugging in a PCI device into a physical box -- a disk will appear in some kind of numbered "slot" on a "bus". In xen the "slot" number is the name of the directory in xenstore (/local/domain/<domid>/device/vbd/<slot>) and the in-guest blkfront driver manages the "bus". From the point of view of Mirage, or Windows with the Citrix PV drivers, the slot number can be any unique integer: the devices will be enumerated and associated with the slot numbers. Windows will then look for partition labels and volumes inside the disks which means that everything still works even if you change the slot numbers. Mirage should probably do this too, but I've not written this code yet.

From the Linux perspective, the *convention* is that the "slot" number is the device node (major-number << 8 || minor-number): blkback will use this to create this kernel device. Next, udev (or something similar) will trigger and actually create a named device node. This is an old convention which dates back to the early days of xen when it was common to create device nodes which pretended to be the individual partitions of SCSI disks. These days the modern twist is to stick to device numbers which correspond to the special /dev/xvd[a-z] devices: 51712, 51768 etc -- this is what you see inside a regular Linux guest.

Note that it doesn't really matter what the disk is made of: it could be a file in /tmp, an iSCSI LUN or a Ceph block device, in all cases it will just appear as a disk in a slot on a bus. In general the VM won't be able to find out where the physical disk came from.

Xen disk support
----------------

Xen supports different backing disk formats (e.g. raw, qcow2, vhd) and "backend types" (e.g. via kernel blkback, via userspace tapdisk, via userspace qemu). I recommend keeping it simple and sticking to raw format. The "backend type" which is simplest (and will give you the least amount of trouble) is kernel blkback (also known as "phy"). The only snag is that blkback can only read from block devices itself i.e. LVM volumes, real disks or loopback devices.

You can create a raw (sparse) disk file like so:

dd if=/dev/zero of=disk.raw bs=1M seek=16 count=0

and then create a loop device like so:

sudo losetup /dev/loop0 disk.raw

(where /dev/loop0 is any free loop device. Note the total number of loop devices is determined at module load time, it's not dynamic unfortunately (last time I checked anyway))

Whatever you do, don't mix I/O to the /dev/loop device with I/O to the file -- it won't be coherent. For example if you modify disk.raw you'll still see stale cached data if you read /dev/loop0.

To attach /dev/loop0 to your VM with slot 51712 ( == /dev/xvda) and /dev/loop1 to your VM with slot 51768 (== /dev/xvdb) you would write:

>>>>
# The disk configuration is defined here:
# An example would look like:
disk = [ '/dev/loop0,,xvda', '/dev/loop1,,xvdb' ]
<<<<

I'm not sure what other strings you can write other than 'xvda' 'xvdb' etc -- using the Linux devices is awkward if you're not actually running Linux. For reference, an OCaml module to convert between them works like this:

>>>>
utop # #require "mirage-block-xen";;
utop # Device_number.(to_linux_device (of_disk_number false 0))
;;
- : string = "xvda" 
utop # Device_number.(to_xenstore_key (of_disk_number false 0))
;;
- : int = 51712
<<<<

Mirage
------

When we write the config.ml for a mirage app we define our block devices like this:

>>>>
let block = {
  Block.name = "myfile";
  filename   = "./disk.raw";
  read_only  = false;
}

let () = Job.register [
    "Block_test.Main", [Driver.console; Driver.Block block]
  ]
<<<<

The generated code will call 'Block.connect "./disk.raw"' and expect the block device backed by "./disk.raw" to be opened. This is fine in userspace where you can simply open the file directly, but with a hypervisor you have to present the file as a block device, and somehow link this block device to the original filename.

The simplest thing to do is to modify the filename to

  filename = "51712"

The [Block.connect "51712"] will interpret the string as the slot number, and open the device.

This is obviously not ideal. Perhaps in future we can:
* make "mirage" generate an .xl config file which references the file (rather than a block device) and rely on the hotplug scripts to manage the loop device. This should work, but since there are more moving parts, may be flaky on your particular distro.
* make "Block.connect" take some kind of volume label rather than a device id, so mirage-block-{unix,xen} can cope with disks being presented in any order on the bus
 
HTH!

--
Dave Scott
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.