WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU cau

To: Daniel Stodden <daniel.stodden@xxxxxxxxxx>
Subject: Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability
From: Scott Garron <xen-devel@xxxxxxxxxxxxxxxxxx>
Date: Mon, 30 Aug 2010 16:30:47 -0400
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Delivery-date: Mon, 30 Aug 2010 13:31:50 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1283195639.26797.451.camel@xxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C7864BB.1010808@xxxxxxxxxxxxxxxxxx> <4C7BE1C6.5030602@xxxxxxxx> <1283195639.26797.451.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100515 Icedove/3.0.4
On 08/30/2010 03:13 PM, Daniel Stodden wrote:
Are you sure it's spinning or just freezing?

     I'm not sure that I understand the difference between those two
terms, so I'm going to guess "freezing" is probably a more accurate
description.  The best way to describe what I was seeing was that my
scripted backup procedure would get to a certain point and freeze, then
I wouldn't be able to break out of it or issue a kill from another SSH
session on its PID.  The kill command freezes the same way (never
returns to a shell prompt and pressing CTRL-C just shows ^C on the
display without breaking out).

Can you try find the minimum number of steps necessary to get into
that state and try sth like $ ps -eH -owchan,nwchan,cmd

     The minimum number of steps that I took, just now, to make it
happen was as follows:

     There's an HVM domU that's active and running Windows 2008 Server,
called "scrappy", with the following Xen configuration:

kernel = "hvmloader"
builder='hvm'
memory = 768
name = "scrappy"
vcpus=1
vif = [ 'type=ioemu, mac=00:16:3e:00:00:18, bridge=eth0','type=ioemu,
mac=00:16:3e:00:00:19, bridge=xenbr1','type=ioemu,
mac=00:16:3e:00:00:1A, bridge=xenbr2' ]
disk = [ 'phy:hurricanevg1/scrappy-primarymaster,xvda,w',
'file:/mnt/scratch/WindowsServerStd2008OEM_x86-64.iso,xvdb:cdrom,r',
'phy:hurricanevg1/scrappy-secondarymaster,xvdc,w' ]
on_reboot   = 'restart'
device_model = 'qemu-dm'
sdl=0
opengl=1
vnc=1
vnclisten="192.168.0.90"
vncdisplay=3
vncunused=1
stdvga=0
serial='pty'
tsc_mode=0
localtime=1
rtc_timeoffset=-3600


     While that's running, I created a snapshot of the primarymaster
volume, then removed it, created one for the secondarymaster, removed
it, and created another one for the primarymaster, tried to remove it,
and the lvremove command froze.  A minute or two later, I got a similar
kernel OOPS message on my console to the one that I posted before.
These are the commands that I used to create and remove the volumes:

lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
hurricanevg1/scrappy-primarymaster

lvremove hurricanevg1/scrappy-primarymaster-backupsnap

lvcreate -L 2G -n scrappy-secondarymaster-backupsnap -s
hurricanevg1/scrappy-secondarymaster

lvremove hurricanevg1/scrappy-secondarymaster-backupsnap

lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
hurricanevg1/scrappy-primarymaster

lvremove hurricanevg1/scrappy-primarymaster-backupsnap


     This time, the console froze completely and I couldn't open any new
SSH sessions into the machine, and couldn't run the ps -eH command that
you asked for in your previous message.  If I go for another attempt,
I'll try to have a few logins already going so I can try to get that
output for you.  This is a somewhat critical, production server, though,
so I didn't want to keep bouncing it in the middle of the day.

Also, is that sequence completely reproducible or does the behaviour
 change evertime? Just trying if there's some point where deadlock
ends and corruption like the one quoted below would start.

     It seems to be 3 for 3 at this point.

--
Scott Garron

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel