This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-users] Understanding sparse-files

To: "Rustedt, Florian" <Florian.Rustedt@xxxxxxxxxxx>
Subject: Re: [Xen-users] Understanding sparse-files
From: John Haxby <john.haxby@xxxxxxxxxx>
Date: Tue, 16 Dec 2008 14:37:33 +0000
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 16 Dec 2008 06:38:18 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <152CD2A019ABF542B87AFDC5010DFBEE2E7256@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <152CD2A019ABF542B87AFDC5010DFBEE2E7256@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird (X11/20081119)
Rustedt, Florian wrote:
What exactly is the advantage of sparse-files against "normal" files
with fixed length?

There are both advantages and disadvantages.

First i thought this is something like an auto-increasing file. But if i
take a 2GB partition and add two sparse-files with 1GB each, i can't add
an additional one, the disk is full?

No, that's not it.

So what about this mystic advantage? Is it only the faster creation of
that file with dd, because it is not completely filled?
That's all?

If you create yourself a nice big sparse file like this

   dd bs=1M seek=10240 count=0 if=/dev/zero of=huge

And then look at what you've got with "ls -lh" you'll see you have a 10G file that was created almost instantly. On the other hand, "ls -sh" will show that the file is actually occupying no space at all (well, almost no space). You can make this file bigger like this:

   dd bs=1M seek=20480 count=0 if=/dev/zero of=huge

and this will make it 20GB and still not occupying much space.

I suspect you already know this, but if you didn't, you do now :-)

The advantage of this 20GB file is precisely that it occupies next to no space on the disk that holds it. I can start writing data into it (that is, use it a a guest's disk) and the blocks needed will be allocated as they are used. In fact, I could have a 200GB guest disk image even though the disk I have at the moment is only 120GB and I'm using quite a lot of it -- it would only be a problem if the guest actually wanted to use all that space.

There are some problems with sparse files: the compress beautfully (gzip reports 99.9%) but it takes a while to read the empty space and when you uncompress the file you discover that it now actually occupies disk space: there's no good way to distinguish between an unallocated block and a block full of zeroes. This also means that you need to be careful how you back these files up: you need something a little cleverer than gzip.

Another problem with sparse files, especially when using them as domU disks is that blocks that are contiguous in the file are not contiguous on the disk. That means if, in the guest, if you just "dd if=/dev/xvda of=/dev/null" then domU will be seeking back and forth all over the place to return the blocks in the order that they're being asked for. You don't need xen for this -- when I downloaded the DVD image of Fedora 10 using transmission (a bittorrent client) a checksum on the resulting file only managed to read it at about 4MB/s. On the other hand, when I copied the file the checksum on the copy ran at closer to 100MB/s -- bittorrent clients like transmission really ought to pre-allocate the disk space to that you get something contiguous and also not embarrassingly run out of space half way through.

In a nutshell, though:

pros: over-committed disk space

cons: performance


Xen-users mailing list