[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 regression related to XSA-471 patches



On Wed, Aug 13, 2025 at 04:53:53AM +0200, Marek Marczykowski-Górecki wrote:
> 
> 
> On August 11, 2025 3:16:46 PM GMT+02:00, Andrew Cooper 
> <andrew.cooper3@xxxxxxxxxx> wrote:
> >On 07/08/2025 6:29 pm, Marek Marczykowski-Górecki wrote:
> >> On Wed, Aug 06, 2025 at 12:46:36PM +0200, Marek Marczykowski-Górecki wrote:
> >>> On Wed, Aug 06, 2025 at 12:36:56PM +0200, Jan Beulich wrote:
> >>>> On 06.08.2025 12:23, Marek Marczykowski-Górecki wrote:
> >>>>> We've got several reports that S3 reliability recently regressed. We
> >>>>> identified it's definitely related to XSA-471 patches, and bisection
> >>>>> points at "x86/idle: Remove broken MWAIT implementation". I don't have
> >>>>> reliable reproduction steps, so I'm not 100% sure if it's really this
> >>>>> patch, or maybe an earlier one - but it's definitely already broken at
> >>>>> this point in the series. Most reports are about Xen 4.17 (as that's
> >>>>> what stable Qubes OS version currently use), but I think I've seen
> >>>>> somebody reporting the issue on 4.19 too (but I don't have clear
> >>>>> evidence, especially if it's the same issue).
> >>>> At the time we've been discussing the explicit raising of TIMER_SOFTIRQ
> >>>> in mwait_idle_with_hints() a lot. If it was now truly missing, that imo
> >>>> shouldn't cause problems only after resume, but then it may have covered
> >>>> for some omission during resume. As a far-fetched experiment, could you
> >>>> try putting that back (including the calculation of the "expires" local
> >>>> variable)?
> >>> Sure, I'll try.
> >> It appears this fixes the issue, at least in ~10 attempts so far
> >> (usually I could reproduce the issue after 2-3 attempts).
> >>
> >
> >Can you show the exact code which seems to have made this stable?
> 
> This patch: 
> https://github.com/marmarek/qubes-vmm-xen/blob/7c9e9e312948c772d9a68090109964121c1d16fe/0001-DEBUG-S3.patch

Hello,

Can you test if the patch below in isolation makes any difference?

Thanks, Roger.
---
diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 2ac162c997fe..27d672ad5dbb 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -19,6 +19,7 @@
 #include <xen/iommu.h>
 #include <xen/param.h>
 #include <xen/sched.h>
+#include <xen/softirq.h>
 #include <xen/spinlock.h>
 #include <xen/watchdog.h>
 
@@ -310,6 +311,7 @@ static int enter_state(u32 state)
     thaw_domains();
     system_state = SYS_STATE_active;
     spin_unlock(&pm_lock);
+    raise_softirq(TIMER_SOFTIRQ);
     return error;
 }
 
diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 0fd8bdba7067..9d66db861b74 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -65,7 +65,6 @@ static struct {
     unsigned int apic_lvt0;
     unsigned int apic_lvt1;
     unsigned int apic_lvterr;
-    unsigned int apic_tmict;
     unsigned int apic_tdcr;
     unsigned int apic_thmr;
 } apic_pm_state;
@@ -658,7 +657,6 @@ int lapic_suspend(void)
     apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0);
     apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1);
     apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR);
-    apic_pm_state.apic_tmict = apic_read(APIC_TMICT);
     apic_pm_state.apic_tdcr = apic_read(APIC_TDCR);
     if (maxlvt >= 5)
         apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
@@ -718,7 +716,7 @@ int lapic_resume(void)
         apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc);
     apic_write(APIC_LVTT, apic_pm_state.apic_lvtt);
     apic_write(APIC_TDCR, apic_pm_state.apic_tdcr);
-    apic_write(APIC_TMICT, apic_pm_state.apic_tmict);
+    apic_write(APIC_TMICT, 0);
     apic_write(APIC_ESR, 0);
     apic_read(APIC_ESR);
     apic_write(APIC_LVTERR, apic_pm_state.apic_lvterr);




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.