[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Yet another S3 issue in Xen 4.14



On Thu, Oct 01, 2020 at 01:43:52PM +0100, Andrew Cooper wrote:
> On 01/10/2020 13:31, Marek Marczykowski-Górecki wrote:
> > On Thu, Oct 01, 2020 at 01:59:32PM +0200, Jan Beulich wrote:
> >> On 01.10.2020 03:12, Marek Marczykowski-Górecki wrote:
> >>> After patching the previous issue ("x86/S3: Fix Shadow Stack resume
> >>> path") I still encounter issues resume from S3.
> >>> Since I had it working on Xen 4.13 on this particular hardware (Thinkpad
> >>> P52), I bisected it and got this:
> >>>
> >>> commit 4304ff420e51b973ec9eb9dafd64a917dd9c0fb1
> >>> Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>> Date:   Wed Dec 11 20:59:19 2019 +0000
> >>>
> >>>     x86/S3: Drop {save,restore}_rest_processor_state() completely
> >>>     
> >>>     There is no need to save/restore FS/GS/XCR0 state.  It will be handled
> >>>     suitably on the context switch away from the idle.
> >>>     
> >>>     The CR4 restoration in restore_rest_processor_state() was actually 
> >>> fighting
> >>>     later code in enter_state() which tried to keep CR4.MCE clear until 
> >>> everything
> >>>     was set up.  Delete the intermediate restoration, and defer final 
> >>> restoration
> >>>     until after MCE is reconfigured.
> >>>     
> >>>     Restoring PAT can be done earlier, and ideally before paging is 
> >>> enabled.  By
> >>>     moving it into the trampoline during the setup for 64bit, the call 
> >>> can be
> >>>     dropped from cpu_init().  The EFI path boot path doesn't disable 
> >>> paging, so
> >>>     make the adjustment when switching onto Xen's pagetables.
> >>>     
> >>>     The only remaing piece of restoration is load_system_tables(), so 
> >>> suspend.c
> >>>     can be deleted in its entirety.
> >>>     
> >>>     Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>>     Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>
> >>> Parent of this commit suspends and resumes just fine. With this commit
> >>> applied, it (I think) it panics, at least I get reboot after 5s. Sadly, I
> >>> don't have serial console there.
> >>>
> >>> I tried also master and stable-4.14 with this commit reverted (and also
> >>> the other fix applied), but it doesn't work. In this case I get a hang on
> >>> resume (power led still flashing, but fan woke up). There are probably
> >>> some other dependencies.
> >> Since bisection may also point you at some intermediate breakage, which
> >> these last results of yours seem to support, could you check whether
> >> 55f8c389d434 put immediately on top of the above commit makes a difference,
> >> and if so resume bisecting from there?
> > Nope, 4304ff420e51b973ec9eb9dafd64a917dd9c0fb1 with 55f8c389d434 on top
> > it still hangs on resume.
> 
> Ok.  I'll see about breaking the change apart so we can bisect which
> specific bit of code movement broke things.

I've done another bisect on the commit broken up in separate changes
(https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/dbg-s3)
and the bad part seems to be this:

From dbdb32f8c265295d6af7cd4cd0aa12b6d04a0430 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Fri, 2 Oct 2020 15:40:22 +0100
Subject: [PATCH 1/1] CR4

---
 xen/arch/x86/acpi/power.c   | 9 ++++-----
 xen/arch/x86/acpi/suspend.c | 3 ---
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 6dfd4c7891..0cda362045 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -195,7 +195,6 @@ static int enter_state(u32 state)
     unsigned long flags;
     int error;
     struct cpu_info *ci;
-    unsigned long cr4;
 
     if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
         return -EINVAL;
@@ -270,15 +269,15 @@ static int enter_state(u32 state)
 
     system_state = SYS_STATE_resume;
 
-    /* Restore CR4 and EFER from cached values. */
-    cr4 = read_cr4();
-    write_cr4(cr4 & ~X86_CR4_MCE);
+    /* Restore EFER from cached value. */
     write_efer(read_efer());
 
     device_power_up(SAVED_ALL);
 
     mcheck_init(&boot_cpu_data, false);
-    write_cr4(cr4);
+
+    /* Restore CR4 from cached value, now MCE is set up. */
+    write_cr4(read_cr4());
 
     printk(XENLOG_INFO "Finishing wakeup from ACPI S%d state.\n", state);
 
diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c
index 060a9313b6..ca987d9019 100644
--- a/xen/arch/x86/acpi/suspend.c
+++ b/xen/arch/x86/acpi/suspend.c
@@ -23,7 +23,4 @@ void save_rest_processor_state(void)
 void restore_rest_processor_state(void)
 {
     load_system_tables();
-
-    /* Restore full CR4 (inc MCE) now that the IDT is in place. */
-    write_cr4(mmu_cr4_features);
 }
-- 
2.20.1

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.