[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposal: use disk sequence numbers to avoid races in blkback



Proposal: Check disk sequence numbers in blkback
================================================

Currently, adding block devices to a domain is racy.  libxl writes the
major and minor number of the device to XenStore, but it does not keep
the block device open until blkback has opened it.  This creates a race
condition, as it is possible for the device to be destroyed and another
device allocated with the same major and minor numbers.  Loop devices
are the most obvious example, since /dev/loop0 can be reused again and
again, but the same problem can also happen with device-mapper devices.
If the major and minor numbers are reused before blkback has attached to
the device, blkback will pass the wrong device to the domain, with
obvious security consequences.

Other programs on Linux have the same problem, and a solution was
committed upstream in the form of disk sequence numbers.  A disk
sequence number, or diskseq, is a 64-bit unsigned monotonically
increasing counter.  The combination of a major and minor number and a
disk sequence number uniquely identifies a block device for the entire
uptime of the system.

I propose that blkback check for an unsigned 64-bit hexadecimal XenStore
entry named “diskseq”.  If the entry exists, blkback checks that the
number stored there matches the disk sequence number of the device.  If
it does not exist, the check is skipped.  If reading the entry fails for
any other reason, the entry is malformed, or if the sequence number is
wrong, blkback refuses to export the device.

The toolstack changes are more involved for two reasons:

1. To ensure that loop devices are not leaked if the toolstack crashes,
   they must be created with the delete-on-close flag set.  This
   requires that the toolstack hold the device open until blkback has
   acquired a handle to it.

2. For block devices that are opened by path, the toolstack needs to
   ensure that the device it has opened is actually the device it
   intended to open.  This requires device-specific verification of the
   open file descriptor.  This is not needed for regular files, as the
   LOOP_CONFIGURE ioctl is called on an existing loop device and sets
   its backing file.

The first is fairly easy in C.  It can be accomplished by means of a
XenStore watch on the “status” entry.  Once that watch fires, blkback
has opened the device, so the toolstack can safely close its file
descriptor.

The second is significantly more difficult.  It requires the block
script to be aware of at least device-mapper devices and LVM2 logical
volumes.  The general technique is common to all block devices: obtain
the sequence number (via the BLKGETDISKSEQ() ioctl) and its major and
minor numbers (via fstat()).  Then open /sys/dev/block/MAJOR:MINOR to
get a directory file descriptor, and use openat(2) and read(2) to get
various sysfs attributes.  Finally, read the diskseq sysfs attribute and
check that it matches the sequence number from BLKGETDISKSEQ().
Alternatively, one can use device-specific methods, such as
device-mapper ioctls.

Device-mapper devices can be detected via the ‘dm/name’ sysfs attribute,
which must match the name under ‘/dev/mapper/’.  If the name is of the
form ‘/dev/X/Y’, and the ‘dm/uuid’ attribute starts with the literal
string “LVM-”, then the expected ‘dm/name’ attribute should be found by
doubling all ‘-’ characters in X and Y, and then joining X and Y with
another ‘-’.  This accounts for LVM2 logical volumes.  Alternatively,
one can use device-mapper ioctls to both check if a device is a
device-mapper device, and to obtain its name and UUID.  I plan on going
with the latter route.

There are *many* other rules that might need to be followed, but these
are the most important ones.  In particular, this is sufficient for
device-mapper devices, which are by far the most important case for now.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.