On Sunday 12 September 2010 20:48:09 Scott Garron wrote:
> On 9/12/2010 5:41 AM, J. Roeleveld wrote:
> > I also use LVMs extensively and do similar steps for backups.
> > 1) umount in domU
> > 2) block-detach
> > 3) lvcreate snapshot
> > 4) block-attach
> > 5) mount in domU
> I think the biggest difference, here, is that you unmount and
> detach the source volumes before creating the snapshot whereas I just
> leave them active and mounted in the guest. I don't know if that will
> end up being the difference between stability and instability on my
> system, but it's an observation and probably worth experimentation.
I tend to umount first to ensure the filesystem is consistent and no writes are
still left in the write-buffer on the guest.
Filesystem recoveries are fine, but why rely on them when it's not necessary?
> > I, however, have no need for HVM and only use PV guests.
> It turns out that it doesn't seem isolated to HVM guests on my
> system any longer. That was just coincidental during the first few
> crashes that I observed.
Ok, I believe the issue might be related to the LVM-stack and the way Xen
holds the devices locked when they are actually mounted and attached?
> > Are you certain the snapshots are large enough to hold all possible
> > changes that might occur on the LV during the existence of the
> > snapshot?
> Certainly. The most recent one to cause a crash has existed
> through the crash and for 3 days now, and is only using 2.65% of its COW
> space. They usually don't get a chance to go above even 0.3% before the
> rsync on them is finished and they are unmounted and removed by the
> backup script.
Ok, guess that's not the cause :)
Although, I get the "unable to remove active" error when there is 0% used, but
also over 20% used, so there is no clear indication what is causing it (to me)
> > Another thing I notice, which might be of help to people who
> > understand this better then I do, in my backup-script, sometimes step
> > "5" fails because the domU hasn't noticed the device is attached
> > again when I try to mount it. The domU-commands are run using
> > SSH-connections.
> That probably just has to do with variations in how long it takes
> the guest kernel to poll or be notified of device changes, and how long
> it takes for its udev to create the device files and whatnot.
> Introducing some sanity checks or just a longer delay in your backup
> script would likely get around that problem. (I could be wrong, though)
I do need to add some sanity checks into the script at some point, but
currently I start these manually and 'fix' the left-overs myself.
The mount-issue is a simple one and I notice this within 30-40 seconds of the
Xen-devel mailing list