[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal: use disk sequence numbers to avoid races in blkback



On Wed, May 11, 2022 at 09:37:54AM +0200, Roger Pau Monné wrote:
> On Tue, May 10, 2022 at 12:22:51PM -0400, Demi Marie Obenour wrote:
> > On Tue, May 10, 2022 at 12:57:48PM +0200, Roger Pau Monné wrote:
> > > On Thu, May 05, 2022 at 08:30:17PM -0400, Demi Marie Obenour wrote:
> > > > Proposal: Check disk sequence numbers in blkback
> > > > ================================================
> > > > 
> > > > Currently, adding block devices to a domain is racy.  libxl writes the
> > > > major and minor number of the device to XenStore, but it does not keep
> > > > the block device open until blkback has opened it.  This creates a race
> > > > condition, as it is possible for the device to be destroyed and another
> > > > device allocated with the same major and minor numbers.  Loop devices
> > > > are the most obvious example, since /dev/loop0 can be reused again and
> > > > again, but the same problem can also happen with device-mapper devices.
> > > > If the major and minor numbers are reused before blkback has attached to
> > > > the device, blkback will pass the wrong device to the domain, with
> > > > obvious security consequences.
> > > > 
> > > > Other programs on Linux have the same problem, and a solution was
> > > > committed upstream in the form of disk sequence numbers.  A disk
> > > > sequence number, or diskseq, is a 64-bit unsigned monotonically
> > > > increasing counter.  The combination of a major and minor number and a
> > > > disk sequence number uniquely identifies a block device for the entire
> > > > uptime of the system.
> > > 
> > > Seems fine to me, this is just an extra check to make sure the block
> > > device opened by blkback is the one that user space intended.  I would
> > > see diskseq as a kind of checksum.
> > 
> > Ideally, diskseq would be the primary means of identifying a device, but
> > that isn’t an option without more substantial changes, sadly.
> > 
> > > > I propose that blkback check for an unsigned 64-bit hexadecimal XenStore
> > > > entry named “diskseq”.  If the entry exists, blkback checks that the
> > > > number stored there matches the disk sequence number of the device.  If
> > > > it does not exist, the check is skipped.  If reading the entry fails for
> > > > any other reason, the entry is malformed, or if the sequence number is
> > > > wrong, blkback refuses to export the device.
> > > > 
> > > > The toolstack changes are more involved for two reasons:
> > > > 
> > > > 1. To ensure that loop devices are not leaked if the toolstack crashes,
> > > >    they must be created with the delete-on-close flag set.  This
> > > >    requires that the toolstack hold the device open until blkback has
> > > >    acquired a handle to it.
> > > 
> > > Does this work with loop devices?  I would expect that you need to
> > > issue a losetup call to detach the device.
> > 
> > That is what the autoclear flag is for.  It will cause the device to be
> > destroyed by the kernel as soon as the last handle to it has been
> > closed.  This is why the toolstack needs to hold a file descriptor to
> > the device.
> 
> What would happen if the backend closes the device (because the
> connection is torn down) and then try to open it again (because the
> guest has triggered a reconnection)?

The reconnect attempt will fail for loop devices, and may fail for
device-mapper devices.  Unless somebody holds the device open until it
has been removed from the guest, reconnect is inherently racy.  That
somebody could be the toolstack, a daemon such as libvirtd, or the
kernel.

> > > Even more, the loop device is created by the block script, but there's
> > > also a window between the block script execution and the toolstack
> > > knowing about the device, which could also allow for a leak?
> > 
> > For this to work, either the toolstack or block script will need to open
> > the file and perform loop(4) ioctls to assign the file descriptor to a
> > loop device.  This cannot be done by a shell script, so I plan on using
> > a C program to perform these tasks.  In Qubes OS, I expect this program
> > to replace the block script entirely, as performance is critical and
> > flexibility less so.  For upstream, I recommend having the block script
> > be a script that calls this C program.
> 
> block scripts can be plain binary executables, so I think it would be
> fine for libxl to just call the executable directly.

Marek had suggested that keeping the block script a script would be
useful for admins, but I am fine with just using a binary.

> > > > 2. For block devices that are opened by path, the toolstack needs to
> > > >    ensure that the device it has opened is actually the device it
> > > >    intended to open.  This requires device-specific verification of the
> > > >    open file descriptor.  This is not needed for regular files, as the
> > > >    LOOP_CONFIGURE ioctl is called on an existing loop device and sets
> > > >    its backing file.
> > > > 
> > > > The first is fairly easy in C.  It can be accomplished by means of a
> > > > XenStore watch on the “status” entry.  Once that watch fires, blkback
> > > > has opened the device, so the toolstack can safely close its file
> > > > descriptor.
> > > 
> > > Does the toolstack really need to close the device?  What harm does it
> > > do to keep the handle open until the domain is destroyed?
> > 
> > This would cause no harm, but it also would not help either, so I do not
> > see any advantages to doing it.
> 
> Well, seems more complex because you need more synchronization between
> blkback and the toolstack in order to detect when blkback has opened
> the device.  If this is not strictly required I would rather avoid it:
> more complexity just leads to more errors.

All of this synchronization can be handled by the block script.

> > > What about disk hotplug?  Which entity will keep the device opened in
> > > that case?  Is xl block-attach going to block until the device
> > > switches to the connected state?
> > 
> > Whichever program opens the file will need to do this.  
> 
> This is not trivial to implement with xl, as `xl block-attach` is a
> short-lived command that just populates the xenstore entries for the
> to be attached device, runs the hotplug script and exits after that.
> I'm not sure we would want to change `xl block-attach` behavior to
> wait until the backend has opened the device.

Anything else is racy, unless there is some program that could keep the
FD open in the background.  Welcome to the ugly world of Linux block
device semantics.  The problem with not using autoclear is that there is
no race-free way I am aware of to clean up the loop device.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.