[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability



On 08/30/2010 03:13 PM, Daniel Stodden wrote:
Are you sure it's spinning or just freezing?

     I'm not sure that I understand the difference between those two
terms, so I'm going to guess "freezing" is probably a more accurate
description.  The best way to describe what I was seeing was that my
scripted backup procedure would get to a certain point and freeze, then
I wouldn't be able to break out of it or issue a kill from another SSH
session on its PID.  The kill command freezes the same way (never
returns to a shell prompt and pressing CTRL-C just shows ^C on the
display without breaking out).

Can you try find the minimum number of steps necessary to get into
that state and try sth like $ ps -eH -owchan,nwchan,cmd

     The minimum number of steps that I took, just now, to make it
happen was as follows:

     There's an HVM domU that's active and running Windows 2008 Server,
called "scrappy", with the following Xen configuration:

kernel = "hvmloader"
builder='hvm'
memory = 768
name = "scrappy"
vcpus=1
vif = [ 'type=ioemu, mac=00:16:3e:00:00:18, bridge=eth0','type=ioemu,
mac=00:16:3e:00:00:19, bridge=xenbr1','type=ioemu,
mac=00:16:3e:00:00:1A, bridge=xenbr2' ]
disk = [ 'phy:hurricanevg1/scrappy-primarymaster,xvda,w',
'file:/mnt/scratch/WindowsServerStd2008OEM_x86-64.iso,xvdb:cdrom,r',
'phy:hurricanevg1/scrappy-secondarymaster,xvdc,w' ]
on_reboot   = 'restart'
device_model = 'qemu-dm'
sdl=0
opengl=1
vnc=1
vnclisten="192.168.0.90"
vncdisplay=3
vncunused=1
stdvga=0
serial='pty'
tsc_mode=0
localtime=1
rtc_timeoffset=-3600


     While that's running, I created a snapshot of the primarymaster
volume, then removed it, created one for the secondarymaster, removed
it, and created another one for the primarymaster, tried to remove it,
and the lvremove command froze.  A minute or two later, I got a similar
kernel OOPS message on my console to the one that I posted before.
These are the commands that I used to create and remove the volumes:

lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
hurricanevg1/scrappy-primarymaster

lvremove hurricanevg1/scrappy-primarymaster-backupsnap

lvcreate -L 2G -n scrappy-secondarymaster-backupsnap -s
hurricanevg1/scrappy-secondarymaster

lvremove hurricanevg1/scrappy-secondarymaster-backupsnap

lvcreate -L 2G -n scrappy-primarymaster-backupsnap -s
hurricanevg1/scrappy-primarymaster

lvremove hurricanevg1/scrappy-primarymaster-backupsnap


     This time, the console froze completely and I couldn't open any new
SSH sessions into the machine, and couldn't run the ps -eH command that
you asked for in your previous message.  If I go for another attempt,
I'll try to have a few logins already going so I can try to get that
output for you.  This is a somewhat critical, production server, though,
so I didn't want to keep bouncing it in the middle of the day.

Also, is that sequence completely reproducible or does the behaviour
 change evertime? Just trying if there's some point where deadlock
ends and corruption like the one quoted below would start.

     It seems to be 3 for 3 at this point.

--
Scott Garron

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.