This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domain

To: "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Minor fix to xentop to stop it dying when domains go away at the wrong time
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Thu, 27 Jul 2006 10:26:26 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 27 Jul 2006 02:26:45 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <342BAC0A5467384983B586A6B0B37671034705FE@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <342BAC0A5467384983B586A6B0B37671034705FE@xxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

On 26 Jul 2006, at 21:36, Graham, Simon wrote:

OK - I've reworked the fix to put it in libxenstat -- still not
completely convinced I like it, but take a look and let me know what you
think - as you suggested, I've made the collectors return a value
indicating if a fatal error occurred (-ve), a retryable error occurred
(0) or they were successful (+ve) and put in code to retry from the top
when a retryable error occurs (with a small 1/4s delay so we don't spin
wildly whilst things stabilize).

Thinking about this some more, those retryable failures will generally mean that a domain is being created or being destroyed. In those two cases, perhaps xenstat_get_node() should simply prune the problematic domain from the list it returns? That would avoid unbounded delay in xenstat_get_node().

I think what you have so far is okay: fatal error in a collector causes error in the caller; recoverable error could cause domain to be pruned rather than retrying in the caller. Maybe we should have macros for the possible return values from a collector: -1/0/+1 return values are not immediately obvious.

 -- Keir

Xen-devel mailing list