This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU cau

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability
From: "J. Roeleveld" <joost@xxxxxxxxxxxx>
Date: Mon, 13 Sep 2010 10:33:40 +0200
Delivery-date: Mon, 13 Sep 2010 01:34:45 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C8D2069.10609@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C7864BB.1010808@xxxxxxxxxxxxxxxxxx> <201009121141.46734.joost@xxxxxxxxxxxx> <4C8D2069.10609@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.13.5 (Linux/2.6.30-gentoo-r5; KDE/4.4.5; x86_64; ; )
On Sunday 12 September 2010 20:48:09 Scott Garron wrote:
> On 9/12/2010 5:41 AM, J. Roeleveld wrote:
> > I also use LVMs extensively and do similar steps for backups.
> > 1) umount in domU
> > 2) block-detach
> > 3) lvcreate snapshot
> > 4) block-attach
> > 5) mount in domU
>       I think the biggest difference, here, is that you unmount and
> detach the source volumes before creating the snapshot whereas I just
> leave them active and mounted in the guest.  I don't know if that will
> end up being the difference between stability and instability on my
> system, but it's an observation and probably worth experimentation.

I tend to umount first to ensure the filesystem is consistent and no writes are 
still left in the write-buffer on the guest.
Filesystem recoveries are fine, but why rely on them when it's not necessary? 

> > I, however, have no need for HVM and only use PV guests.
>       It turns out that it doesn't seem isolated to HVM guests on my
> system any longer.  That was just coincidental during the first few
> crashes that I observed.

Ok, I believe the issue might be related to the LVM-stack and the way Xen 
holds the devices locked when they are actually mounted and attached?

> > Are you certain the snapshots are large enough to hold all possible
> > changes that might occur on the LV during the existence of the
> > snapshot?
>       Certainly.  The most recent one to cause a crash has existed
> through the crash and for 3 days now, and is only using 2.65% of its COW
> space.  They usually don't get a chance to go above even 0.3% before the
> rsync on them is finished and they are unmounted and removed by the
> backup script.

Ok, guess that's not the cause :)
Although, I get the "unable to remove active" error when there is 0% used, but 
also over 20% used, so there is no clear indication what is causing it (to me)

> > Another thing I notice, which might be of help to people who
> > understand this better then I do, in my backup-script, sometimes step
> > "5" fails because the domU hasn't noticed the device is attached
> > again when I try to mount it. The domU-commands are run using
> > SSH-connections.
>       That probably just has to do with variations in how long it takes
> the guest kernel to poll or be notified of device changes, and how long
> it takes for its udev to create the device files and whatnot.
> Introducing some sanity checks or just a longer delay in your backup
> script would likely get around that problem.  (I could be wrong, though)

I do need to add some sanity checks into the script at some point, but 
currently I start these manually and 'fix' the left-overs myself.
The mount-issue is a simple one and I notice this within 30-40 seconds of the 
scripts starting.


Xen-devel mailing list