diff -r 50cf07f42fdd tools/blktap2/README --- a/tools/blktap2/README Thu Jun 04 10:57:39 2009 +0100 +++ b/tools/blktap2/README Thu Jun 04 20:38:34 2009 -0700 @@ -1,19 +1,21 @@ -Blktap Userspace Tools + Library +Blktap2 Userspace Tools + Library ================================ +Dutch Meyer +4th June 2009 + Andrew Warfield and Julian Chesterfield 16th June 2006 -{firstname.lastname}@cl.cam.ac.uk -The blktap userspace toolkit provides a user-level disk I/O -interface. The blktap mechanism involves a kernel driver that acts +The blktap2 userspace toolkit provides a user-level disk I/O +interface. The blktap2 mechanism involves a kernel driver that acts similarly to the existing Xen/Linux blkback driver, and a set of -associated user-level libraries. Using these tools, blktap allows +associated user-level libraries. Using these tools, blktap2 allows virtual block devices presented to VMs to be implemented in userspace and to be backed by raw partitions, files, network, etc. -The key benefit of blktap is that it makes it easy and fast to write +The key benefit of blktap2 is that it makes it easy and fast to write arbitrary block backends, and that these user-level backends actually perform very well. Specifically: @@ -38,7 +40,7 @@ How it works (in one paragraph): -Working in conjunction with the kernel blktap driver, all disk I/O +Working in conjunction with the kernel blktap2 driver, all disk I/O requests from VMs are passed to the userspace deamon (using a shared memory interface) through a character device. Each active disk is mapped to an individual device node, allowing per-disk processes to @@ -49,74 +51,271 @@ code. We provide a simple, asynchronous virtual disk interface that makes it quite easy to add new disk implementations. -As of June 2006 the current supported disk formats are: +As of June 2009 the current supported disk formats are: - Raw Images (both on partitions and in image files) - - File-backed Qcow disks - - Standalone sparse Qcow disks - - Fast shareable RAM disk between VMs (requires some form of cluster-based - filesystem support e.g. OCFS2 in the guest kernel) - - Some VMDK images - your mileage may vary + - Fast sharable RAM disk between VMs (requires some form of + cluster-based filesystem support e.g. OCFS2 in the guest kernel) + - VHD, including snapshots and sparse images + - Qcow, including snapshots and sparse images -Raw and QCow images have asynchronous backends and so should perform -fairly well. VMDK is based directly on the qemu vmdk driver, which is -synchronous (a.k.a. slow). Build and Installation Instructions =================================== -Make to configure the blktap backend driver in your dom0 kernel. It -will cooperate fine with the existing backend driver, so you can -experiment with tap disks without breaking existing VM configs. +Make to configure the blktap2 backend driver in your dom0 kernel. It +will inter-operate with the existing backend and frontend drivers. It +will also cohabitate with the original blktap driver. However, some +formats (currently aio and qcow) will default to their blktap2 +versions when specified in a vm configuration file. -To build the tools separately, "make && make install" in -tools/blktap. +To build the tools separately, "make && make install" in +tools/blktap2. Using the Tools =============== -Prepare the image for booting. For qcow files use the qcow utilities -installed earlier. e.g. qcow-create generates a blank standalone image -or a file-backed CoW image. img2qcow takes an existing image or -partition and creates a sparse, standalone qcow-based file. +Preparing an image for boot: The userspace disk agent is configured to start automatically via xend -(alternatively you can start it manually => 'blktapctrl') -Customise the VM config file to use the 'tap' handler, followed by the -driver type. e.g. for a raw image such as a file or partition: +Customize the VM config file to use the 'tap:tapdisk' handler, +followed by the driver type. e.g. for a raw image such as a file or +partition: -disk = ['tap:aio:,sda1,w'] +disk = ['tap:tapdisk:aio:,sda1,w'] -e.g. for a qcow image: +Alternatively, the vhd-util tool (installed with make install, or in +/blktap2/vhd) can be used to build sparse copy-on-write vhd images. -disk = ['tap:qcow:,sda1,w'] +For example, to build a sparse image - + vhd-util create -n MyVHDFile -s 1024 +This creates a sparse 1GB file named "MyVHDFile" that can be mounted +and populated with data. -Mounting images in Dom0 using the blktap driver +One can also base the image on a raw file - + vhd-util snapshot -n MyVHDFile -p SomeRawFile -m + +This creates a sparse VHD file named "MyVHDFile" using "SomeRawFile" +as a parent image. Copy-on-write semantics ensure that writes will be +stored in "MyVHDFile" while reads will be directed to the most +recently written version of the data, either in "MyVHDFile" or +"SomeRawFile" as is appropriate. Other options exist as well, consult +the vhd-util application for the complete set of VHD tools. + +VHD files can be mounted automatically in a guest similarly to the +above AIO example simply by specifying the vhd driver. + +disk = ['tap:tapdisk:vhd:,sda1,w'] + + +Snapshots: + +Pausing a guest will also plug the corresponding IO queue for blktap2 +devices and stop blktap2 drivers. This can be used to implement a +safe live snapshot of qcow and vhd disks. An example script "xmsnap" +is shown in the tools/blktap2/drivers directory. This script will +perform a live snapshot of a qcow disk. VHD files can use the +"vhd-util snapshot" tool discussed above. If this snapshot command is +applied to a raw file mounted with tap:tapdisk:AIO, include the -m +flag and the driver will be reloaded as VHD. If applied to an already +mounted VHD file, omit the -m flag. + + +Mounting images in Dom0 using the blktap2 driver =============================================== Tap (and blkback) disks are also mountable in Dom0 without requiring an -active VM to attach. You will need to build a xenlinux Dom0 kernel that -includes the blkfront driver (e.g. the default 'make world' or -'make kernels' build. Simply use the xm command-line tool to activate -the backend disks, and blkfront will generate a virtual block device that -can be accessed in the same way as a loop device or partition: +active VM to attach. -e.g. for a raw image file that would normally be mounted using -the loopback driver (such as 'mount -o loop /mnt/disk'), do the -following: +The syntax is - + tapdisk2 -n : -xm block-attach 0 tap:aio: /dev/xvda1 w 0 -mount /dev/xvda1 /mnt/disk <--- don't use loop driver +For example - + tapdisk2 -n aio:/home/images/rawFile.img -In this way, you can use any of the userspace device-type drivers built -with the blktap userspace toolkit to open and mount disks such as qcow -or vmdk images: +When successful the location of the new device will be provided by +tapdisk2 to stdout and tapdisk2 will terminate. From that point +forward control of the device is provided through sysfs in the +directory- -xm block-attach 0 tap:qcow: /dev/xvda1 w 0 -mount /dev/xvda1 /mnt/disk + /sys/class/blktap2/blktap#/ +Where # is a blktap2 device number present in the path that tapdisk2 +printed before terminating. The sysfs interface is largely intuitive, +for example, to remove tap device 0 one would- + + echo 1 > /sys/class/blktap2/blktap0/remove +Similarly, a pause control is available, which is can be used to plug +the request queue of a live running guest. +Previous versions of blktap mounted devices in dom0 by using blkfront +in dom0 and the xm block-attach command. This approach is still +available, though slightly more cumbersome. + + +Tapdisk Development +=============================================== + +People regularly ask how to develop their own tapdisk drivers, and +while it has not yet been well documented, the process is relatively +easy. Here I will provide a brief overview. The best reference, of +course, comes from the existing drivers. Specifically, +blktap2/drivers/block-ram.c and blktap2/drivers/block-aio.c provide +the clearest examples of simple drivers. + +Setup: + +First you need to register your new driver with blktap. This is done +in disktypes.h. There are five things that you must do. To +demonstrate, I will create a disk called "mynewdisk", you can name +yours freely. + +1) Forward declare an instance of struct tap_disk. + +e.g. - + extern struct tap_disk tapdisk_mynewdisk; + +2) Claim one of the unused disk type numbers, take care to observe the +MAX_DISK_TYPES macro, increasing the number if necessary. + +e.g. - + #define DISK_TYPE_MYNEWDISK 10 + +3) Create an instance of disk_info_t. The bulk of this file contains examples of these. + +e.g. - + static disk_info_t mynewdisk_disk = { + DISK_TYPE_MYNEWDISK, + "My New Disk (mynewdisk)", + "mynewdisk", + 0, + #ifdef TAPDISK + &tapdisk_mynewdisk, + #endif + }; + +A few words about what these mean. The first field must be the disk +type number you claimed in step (2). The second field is a string +describing your disk, and may contain any relevant info. The third +field is the name of your disk as will be used by the tapdisk2 utility +and xend (for example tapdisk2 -n mynewdisk:/path/to/disk.image, or in +your xm create config file). The forth is binary and determines +whether you will have one instance of your driver, or many. Here, a 1 +means that your driver is a singleton and will coordinate access to +any number of tap devices. 0 is more common, meaning that you will +have one driver for each device that is created. The final field +should contain a reference to the struct tap_disk you created in step +(1). + +4) Add a reference to your disk info structure (from step (3)) to the +dtypes array. Take care here - you need to place it in the position +corresponding to the device type number you claimed in step (2). So +we would place &mynewdisk_disk in dtypes[10]. Look at the other +devices in this array and pad with "&null_disk," as necessary. + +5) Modify the xend python scripts. You need to add your disk name to +the list of disks that xend recognizes. + +edit: + tools/python/xen/xend/server/BlktapController.py + +And add your disk to the "blktap_disk_types" array near the top of +your file. Use the same name you specified in the third field of step +(3). The order of this list is not important. + + +Now your driver is ready to be written. Create a block-mynewdisk.c in +tools/blktap2/drivers and add it to the Makefile. + + +Development: + +Copying block-aio.c and block-ram.c would be a good place to start. +Read those files as you go through this, I will be assisting by +commenting on a few useful functions and structures. + +struct tap_disk: + +Remember the forward declaration in step (1) of the setup phase above? +Now is the time to make that structure a reality. This structure +contains a list of function pointers for all the routines that will be +asked of your driver. Currently the required functions are open, +close, read, write, get_parent_id, validate_parent, and debug. + +e.g. - + struct tap_disk tapdisk_mynewdisk = { + .disk_type = "tapdisk_mynewdisk", + .flags = 0, + .private_data_size = sizeof(struct tdmynewdisk_state), + .td_open = tdmynewdisk_open, + .... + +The private_data_size field is used to provide a structure to store +the state of your device. It is very likely that you will want +something here, but you are free to design whatever structure you +want. Blktap will allocate this space for you, you just need to tell +it how much space you want. + + +tdmynewdisk_open: + +This is the open routine. The first argument is a structure +representing your driver. Two fields in this array are +interesting. + +driver->data will contain a block of memory of the size your requested +in in the .private_data_size field of your struct tap_disk (above). + +driver->info contains a structure that details information about your +disk. You need to fill this out. By convention this is done with a +_get_image_info() function. Assign a size (the total number of +sectors), sector_size (the size of each sector in bytes, and set +driver->info->info to 0. + +The second parameter contains the name that was specified in the +creation of your device, either through xend, or on the command line +with tapdisk2. Usually this specifies a file that you will open in +this routine. The final parameter, flags, contains one of a number of +flags specified in tapdisk.h that may change the way you treat the +disk. + + +_queue_read/write: + +These are your read and write operations. What you do here will +depend on your disk, but you should do exactly one of- + +1) call td_complete_request with either error or success code. + +2) Call td_forward_request, which will forward the request to the next +driver in the stack. + +3) Queue the request for asynchronous processing with +td_prep_read/write. In doing so, you will also register a callback +for request completion. When the request completes you must do one of +options (1) or (2) above. Finally, call td_queue_tiocb to submit the +request to a wait queue. + +The above functions are defined in tapdisk-interface.c. If you don't +use them as specified you will run into problems as your driver will +fail to inform blktap of the state of requests that have been +submitted. Blktap keeps track of all requests and does not like losing track. + + +_close, _get_parent_id, _validate_parent: + +These last few tend to be very routine. _close is called when the +device is closed, and also when it is paused (in this case, open will +also be called later). The other functions are used in stacking +drivers. Most often drivers will return TD_NO_PARENT and -EINVAL, +respectively. + + + + + +