[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Proposal: use disk sequence numbers to avoid races in blkback
Proposal: Check disk sequence numbers in blkback ================================================ Currently, adding block devices to a domain is racy. libxl writes the major and minor number of the device to XenStore, but it does not keep the block device open until blkback has opened it. This creates a race condition, as it is possible for the device to be destroyed and another device allocated with the same major and minor numbers. Loop devices are the most obvious example, since /dev/loop0 can be reused again and again, but the same problem can also happen with device-mapper devices. If the major and minor numbers are reused before blkback has attached to the device, blkback will pass the wrong device to the domain, with obvious security consequences. Other programs on Linux have the same problem, and a solution was committed upstream in the form of disk sequence numbers. A disk sequence number, or diskseq, is a 64-bit unsigned monotonically increasing counter. The combination of a major and minor number and a disk sequence number uniquely identifies a block device for the entire uptime of the system. I propose that blkback check for an unsigned 64-bit hexadecimal XenStore entry named “diskseq”. If the entry exists, blkback checks that the number stored there matches the disk sequence number of the device. If it does not exist, the check is skipped. If reading the entry fails for any other reason, the entry is malformed, or if the sequence number is wrong, blkback refuses to export the device. The toolstack changes are more involved for two reasons: 1. To ensure that loop devices are not leaked if the toolstack crashes, they must be created with the delete-on-close flag set. This requires that the toolstack hold the device open until blkback has acquired a handle to it. 2. For block devices that are opened by path, the toolstack needs to ensure that the device it has opened is actually the device it intended to open. This requires device-specific verification of the open file descriptor. This is not needed for regular files, as the LOOP_CONFIGURE ioctl is called on an existing loop device and sets its backing file. The first is fairly easy in C. It can be accomplished by means of a XenStore watch on the “status” entry. Once that watch fires, blkback has opened the device, so the toolstack can safely close its file descriptor. The second is significantly more difficult. It requires the block script to be aware of at least device-mapper devices and LVM2 logical volumes. The general technique is common to all block devices: obtain the sequence number (via the BLKGETDISKSEQ() ioctl) and its major and minor numbers (via fstat()). Then open /sys/dev/block/MAJOR:MINOR to get a directory file descriptor, and use openat(2) and read(2) to get various sysfs attributes. Finally, read the diskseq sysfs attribute and check that it matches the sequence number from BLKGETDISKSEQ(). Alternatively, one can use device-specific methods, such as device-mapper ioctls. Device-mapper devices can be detected via the ‘dm/name’ sysfs attribute, which must match the name under ‘/dev/mapper/’. If the name is of the form ‘/dev/X/Y’, and the ‘dm/uuid’ attribute starts with the literal string “LVM-”, then the expected ‘dm/name’ attribute should be found by doubling all ‘-’ characters in X and Y, and then joining X and Y with another ‘-’. This accounts for LVM2 logical volumes. Alternatively, one can use device-mapper ioctls to both check if a device is a device-mapper device, and to obtain its name and UUID. I plan on going with the latter route. There are *many* other rules that might need to be followed, but these are the most important ones. In particular, this is sufficient for device-mapper devices, which are by far the most important case for now. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |