WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Detecting deadlocks with hypervisor..

To: Thileepan Subramaniam <thileepan_@xxxxxxxxxxx>
Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
From: Anthony Liguori <aliguori@xxxxxxxxxx>
Date: Sun, 19 Mar 2006 10:30:09 -0600
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 19 Mar 2006 16:31:23 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <BAY108-F1559EBC532E859A0955BCF6DA0@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <BAY108-F1559EBC532E859A0955BCF6DA0@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mail/News 1.5 (X11/20060309)
Thileepan Subramaniam wrote:
Hello,

I am trying to see if the hypervisor can be used to detect deadlocks in the guest VMs. My goal is to detect if a guest OS is deadlocked, and if it is, then create a clone of the deadlocked OS without the locking condition, and letting the clone run. While the clone runs I am hoping to generate some hints that could tell me what caused the deadlock.

I simulated a deadlock/hang situation in a guest OS (by loading a badly written module to the kernel) and when the guestOS kernel was hanging, I ran "xm save" from Dom-0. But this command waits forever.

I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These seem to be called when I run 'xm save'. But beyond a point I am not sure what the python scripts do. I also see some libxc files such as xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the XenU). Can someone help me by explaining me what happens behind the scene when "xm save" is called ? Is there any good documentation explaining which actions are done by which layers (eg: python layer, C layer etc).

Also, does it seem viable to clone a copy of a deadlocked guest OS in the first place?

As Ewan pointed out, xm save is guest-assisted so a hung guest will not be savable.

You may want to look at xc_domain_dumpcore(). You could do some post-analysis of the core dump to determine where it locked. Determining why it dead-locked is of course impossible for the general case but you may be able to develop some interesting heuristics with appropriate static analysis.

As for recovering the guest, a really clever approach would be to rewrite some of the locking code (maybe temporarily?) by mapping the guest's code page into dom0's memory after examining EIP in the core.

I reckon there's a rather interesting paper to be written on something like this :-)

Regards,

Anthony Liguori

thanks!
- ts

_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel