This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] Debugging a weird hardware fault.

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Debugging a weird hardware fault.
From: Keir Fraser <keir.xen@xxxxxxxxx>
Date: Fri, 29 Jul 2011 08:24:50 +0100
Cc: winston.l.wang@xxxxxxxxx, gang.wei@xxxxxxxxx
Delivery-date: Fri, 29 Jul 2011 00:25:48 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; bh=w6qy0toXBXQ3GIQqSoyJh5zk2geea2sRep2/2h1wcyk=; b=Ir94ScPNOwwL8HLYP+Rv9MGmuNShu4kSOcCFG+O/Bbr/JNVjywbiv9xuNMHRm1sNiE 0sP1YK7l6hisXQfEuK9eaDFTX2m5la+9MAc5w1J/s5XltUq4/of7UVEmk68ZFzK7Pp+q tKuYHwm/x0/ZCVb70XhkjMSZN5kuwHcK+vDMg=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CA581B8A.1EBFA%keir.xen@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcxNZu1KSOfHQqC8kEeWRl0hBxZ7qQADlzEKABJW0loAAH0rrQ==
Thread-topic: [Xen-devel] Debugging a weird hardware fault.
User-agent: Microsoft-Entourage/
Cc'ing some of the Xen ACPI/PM maintainers to see if they have an opinion on
this issue...

On 29/07/2011 08:10, "Keir Fraser" <keir.xen@xxxxxxxxx> wrote:

> On 28/07/2011 23:45, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
>> Initially, an SMI was what I was thinking, but the triple fault occurs
>> whether
>> you start bringing down CPUs or not.  While waiting 10 seconds in the
>> platform_op select statment, the fault still occurs when all CPUs are still
>> up, all IRQs still enabled and potentially domU's still up.  (Also, from
>> studying the Xen3.4 code, I believe that interrupts are still actually up
>> during time_suspend(), but are soon brought down by lapic_suspend() later in
>> device_power_down().)
>> Convertly, in the hacked up case where I ditched most of the shared S3/S5
>> codepath and just hit the PM1A, the server correctly shut down and stayed
>> shut
>> down, implying that the fault was caused by software (be it BIOS or OS)
>> rather
>> than hardware.  From what I understand of the APCI spec (and I claim very
>> little knowledge), there are a multitude of hardware events which could bring
>> the server out of S5, appearing as a triple fault, which would not be
>> affected
>> by whether you had hit the PM1A register.
>> In this specific example, dom0 regular shudown code already brought down the
>> domUs (of which there were none because we never started any), and we were
>> running with 1 CPU only so no others were up.  This opens up a whole host of
>> other possibilities which could be playing an effect betwee the
>> XENPF_enter_apci_sleep hypercall and Xen actually shutting itself down.
> Well I expect dom0 has done some going-to-sleep work that has left the
> platform on borrowed time w.r.t. bashing SLP_EN into the PM1 control
> register and actually finalising the shutdown.
> For example, it will have executed the _GTS ACPI method if there is one.
> That is supposed to happen immediately before writing PM1.SLP_EN, with no
> intervening interrupt activity or I/O. Obviously things don't work out quite
> like that when running on Xen!
> This is an architectural limitation of how ACPI sleep is currently
> implemented for Xen. It may need some rethinking to do it really properly
> according to the spec. e.g., do a hypercall just to prepare Xen for
> shutdown, but return back to dom0 in some limited environment to actually
> have it do the final ACPI sleep work. Or have dom0 pass a pointer to a code
> block that Xen should simply jump at to get the sleep to happen (where that
> code block would basically be dom0's acpi_enter_sleep() function). There are
> a few, somewhat distasteful, options that are more respectful of the ACPI
> spec than we are right now.
>  -- Keir

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>