[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] AFS-based VBD backend

Hi All,

Has anyone ever put any thought (or code) into an (Open)AFS-based
virtual block device backend for Xen?  This driver would use a large
file stored in AFS as its underlying "device", as opposed to a loop
device or an LVM or physical device.  If I understand Xen's interface
architecture correctly, this backend would be stacked something like

    ext3 or other standard filesystem in domN
    existing block device frontend in domN
    new "afs file" backend in dom0
    large file in /afs in dom0
    AFS client in dom0
    AFS server somewhere on net

You would configure this using something like:

    disk = ['afsfile:/full/path/to/afs/file,sda1,w']

Only dom0 would need to be an AFS client; the kerberos ticket(s) would
be owned and renewed by management code in dom0 only; the other domains
would not need to be kerberized, would not have access to any keytabs in
dom0, etc.  Each domain could have a single kerberos ID and its own AFS
volume(s) for storage of its "disks", but the individual users or
daemons in each domain would not be aware of kerberos at all.  

I think most of this could be done in python -- the backend driver
itself might be a relatively thin layer to translate block addressing to
and from file byte locations, talk to the frontend, and do a periodic
fsync() on the underlying file to write the changes back to the AFS

There'd be nothing to keep someone from using this backend on top of
ordinary, non-AFS files; this might provide better performance (one less
layer) than going through the loop device driver.  Perhaps the VBD type
name might even want to be 'rawfile' or somesuch instead of 'afsfile',
though an AFS-specific bind/unbind script might be useful for token

Some potential FAQ answers, before I get bombarded:  ;-)

Q:  Why is this useful?
A:  Because AFS can be run over the Internet, has excellent security,
    client-side caching, server-side replication and snapshots, and
    would lend itself well to an environment where the AFS clients and
    servers might be in different geographical locations, owned by two
    different parties, hosting Xen domN filesystems owned in turn by
    other parties.  

Q:  Why not just use native AFS in the unprivileged domains?
A:  Because then those domains would have to be kerberized AFS clients,
    and the users/owners of those domains would have to have kerberos
    ID's, they'd have to be knowledgable in AFS ACLs, the AFS
    directory-based security model, daemon token renewal, and so on.
    The root-user domain owners would have to know how to manage
    kerberos users, and they'd have to have kerberos admin rights.  This
    is too much to expect.  The typical Xen customer wants to be able to
    just use normal *nix tools to add or delete a user -- that won't work
    with kerberos.  All users of all domains would be in the same
    kerberos namespace too -- there could be only one "jim" across all
    domains, even though those domains are supposed to be independent
    *nix machines owned by different parties -- very difficult to

Q:  Why not just use a loop device on top of AFS, with the 'file:' 
    VBD type?
A:  Loop devices on top of AFS files hang with large volumes of 
    I/O -- looks like a deadlock of some sort (in my tests, a dd of
    around 2-300 Mb into an AFS-based loop device appears to
    consistently hang the kernel, even with a 500Mb or larger AFS
    cache).  In addition, an unmodified loop.c will not fsync() the
    underlying file; changes won't get written back to the AFS server
    until loop teardown.  I've added an fsync() to the worker thread of
    loop.c to take care of this every few seconds; that seems to work
    but I can't really stress test it much because of the hang problem.

Q:  Why not use iSCSI, nbd, drbd, gnbd, or enbd?
A:  While these each seem to do their job well, none offer all of the 
    maturity, client-side caching, WAN-optimized protocols, volume
    management, backups, easy snapshots, scalability, central
    administration, redundancy, replication, or kerberized security of

What did I miss?  ;-)

Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
http://www.stevegt.com -- http://Infrastructures.Org

SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.