WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] libxl/xl: improve behaviour when guest fails to

To: Ian Campbell <ian.campbell@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] libxl/xl: improve behaviour when guest fails to suspend itself
From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Date: Wed, 9 Feb 2011 16:23:31 +0000
Cc: Ian, Campbell <Ian.Campbell@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 09 Feb 2011 08:21:56 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <d631a4996cbc69a7fa84.1297185841@xxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <d631a4996cbc69a7fa84.1297185841@xxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)
On Tue, 8 Feb 2011, Ian Campbell wrote:
> # HG changeset patch
> # User Ian Campbell <ian.campbell@xxxxxxxxxx>
> # Date 1297185819 0
> # Node ID d631a4996cbc69a7fa8489f28d4a3313db12e77a
> # Parent  a46b91cd8202726aecd9ddefd8e75faff48144d6
> libxl/xl: improve behaviour when guest fails to suspend itself.
> 
> The PV suspend protocol requires guest co-operating whereby the guest
> must respond to a suspend request written to the xenstore control node
> by clearing the node and then making a suspend hypercall.
> 
> Currently when a guest fails to do this libxl times out and returns
> a generic failure code to the caller.
> 
> In response to this failure xl attempts to resume the guest. However
> if the guest has not responded to the suspend request then the is no
> guarantee that the guest has made the suspend hypercall (in fact it is
> quite unlikely). Since the resume process attempts to modify the
> return value of the hypercall (to indicate a cancelled suspend) this
> results in the guest eax/rax register being corrupted!
> 
> To fix this change libxl to do the following:
>    * Wait for the guest to acknowledge the suspend request.
>      - on timeout cancel the suspend request.
>        - if cancellation is successful then return a new error code to
>          indicate that the guest is not responding.
>        - if the cancel does not succeed then we raced with the guest
>          which actually did acknowledge at the last minute, so
>          continue.
>    * Wait for the guest to suspend.
>      - on timeout return the standard error code as before
>    * Guest successfully suspended, return success.
> 
> Lastly in xl do not attempt to resume a guest if it has not responded
> to the suspend request.
> 
> Tested by live migration of PVops kernels which either ignore the
> suspend request, have already crashed and those which suspend/resume
> correctly. In the first two cases the source domain is left alone (and
> continues to function in the first case) and in the third the
> migration is successful.
> 
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> 


Acked-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel