WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] xen 2.0.6, on_crash = 'restart' not restarting after cra

To: Steve Wray <steve.wray@xxxxxxxxx>
Subject: Re: [Xen-users] xen 2.0.6, on_crash = 'restart' not restarting after crash
From: Tim Post <tim.post@xxxxxxxxxxx>
Date: Mon, 30 Apr 2007 11:39:25 +0800
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 29 Apr 2007 20:38:20 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <46350F92.1010609@xxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: Gridnix
References: <46350F92.1010609@xxxxxxxxx>
Reply-to: tim.post@xxxxxxxxxxx
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
On Mon, 2007-04-30 at 09:35 +1200, Steve Wray wrote:
> Hi all,
> 
> We have a xen instance (under xen 2.0.6) thats pretty unreliable; the
> domU crashes fairly regularly.

If you must use Xen v2, try 2.0.7 (or the last 2.0-testing Mercurial).
2.0.7 isn't the most feature packed release but it is extremely stable.

I'd really recommend upgrading to 3.0.4-testing or 3.0.5-testing (I
think its at rc4 now) unless you depend on an older kernel version. I
have some that have to stay at 2.0.7 until I find a better fit for PV
open SSI clusters.

> Yes, we are trying to figure out why, but in the meantime I discovered
> that there is a config option 'on_crash'.
> 
> We've implemented this in the config file for that xen domain and we
> have this in the config file for the domain:
> 
> restart = 'always'
> 
> on_crash = 'restart'

This really depends on Xen's ability to see the dom-u as 'crashed'.
Typical 'crashes' on older kernels don't look much different to Xen than
a running or blocking state.

Examples would be, if its non responsive and shown as running, the guest
is most likely just spiraling out of control.

If its non responsive and blocking, any number of things could be going
wrong, but Xen doesn't see it. Unless its a full out kernel panic, most
likely Xen 2 won't see your guests crash.

Can you give more details of the crash?

> 
> The domain has indeed crashed since this was implemented and did not
> appear to recover, at least not for the 6 minutes we gave it to restart
> the domain:
> 
> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT> xend.domain.exit
> ['domUhostname', '14', 'crash']
> [2007-04-30 09:06:19 xend] INFO (XendRoot:112) EVENT>
> xend.domain.destroy ['domUhostname', '14']
> [2007-04-30 09:06:20 xend] INFO (XendRoot:112) EVENT> xend.domain.died
> ['domUhostname', '14']
> [2007-04-30 09:12:03 xend] DEBUG (XendDomainInfo:720) init_domain>
> Created domain=15 name=domUhostname memory=1200
> [2007-04-30 09:12:03 xend] INFO (console:94) Created console id=14
> domain=15 port=9615
> 

> And are there any other things we can do to restart a domain after a crash?

Many people favor some kind of key pairing to enable a centralized
monitor to be able to restart guests in the event of failure, even with
newer versions of Xen, or using the API.

If you aren't depending on a very specific older patched kernel, I'd
just move up to 3.0.4-testing. 3.0.5-testing has been pretty stable too.

Hope this helps,
--Tim


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>