[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: S3 regression related to XSA-471 patches
On Wed, Aug 13, 2025 at 04:53:53AM +0200, Marek Marczykowski-Górecki wrote: > > > On August 11, 2025 3:16:46 PM GMT+02:00, Andrew Cooper > <andrew.cooper3@xxxxxxxxxx> wrote: > >On 07/08/2025 6:29 pm, Marek Marczykowski-Górecki wrote: > >> On Wed, Aug 06, 2025 at 12:46:36PM +0200, Marek Marczykowski-Górecki wrote: > >>> On Wed, Aug 06, 2025 at 12:36:56PM +0200, Jan Beulich wrote: > >>>> On 06.08.2025 12:23, Marek Marczykowski-Górecki wrote: > >>>>> We've got several reports that S3 reliability recently regressed. We > >>>>> identified it's definitely related to XSA-471 patches, and bisection > >>>>> points at "x86/idle: Remove broken MWAIT implementation". I don't have > >>>>> reliable reproduction steps, so I'm not 100% sure if it's really this > >>>>> patch, or maybe an earlier one - but it's definitely already broken at > >>>>> this point in the series. Most reports are about Xen 4.17 (as that's > >>>>> what stable Qubes OS version currently use), but I think I've seen > >>>>> somebody reporting the issue on 4.19 too (but I don't have clear > >>>>> evidence, especially if it's the same issue). > >>>> At the time we've been discussing the explicit raising of TIMER_SOFTIRQ > >>>> in mwait_idle_with_hints() a lot. If it was now truly missing, that imo > >>>> shouldn't cause problems only after resume, but then it may have covered > >>>> for some omission during resume. As a far-fetched experiment, could you > >>>> try putting that back (including the calculation of the "expires" local > >>>> variable)? > >>> Sure, I'll try. > >> It appears this fixes the issue, at least in ~10 attempts so far > >> (usually I could reproduce the issue after 2-3 attempts). > >> > > > >Can you show the exact code which seems to have made this stable? > > This patch: > https://github.com/marmarek/qubes-vmm-xen/blob/7c9e9e312948c772d9a68090109964121c1d16fe/0001-DEBUG-S3.patch Hello, Can you test if the patch below in isolation makes any difference? Thanks, Roger. --- diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index 2ac162c997fe..27d672ad5dbb 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -19,6 +19,7 @@ #include <xen/iommu.h> #include <xen/param.h> #include <xen/sched.h> +#include <xen/softirq.h> #include <xen/spinlock.h> #include <xen/watchdog.h> @@ -310,6 +311,7 @@ static int enter_state(u32 state) thaw_domains(); system_state = SYS_STATE_active; spin_unlock(&pm_lock); + raise_softirq(TIMER_SOFTIRQ); return error; } diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index 0fd8bdba7067..9d66db861b74 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -65,7 +65,6 @@ static struct { unsigned int apic_lvt0; unsigned int apic_lvt1; unsigned int apic_lvterr; - unsigned int apic_tmict; unsigned int apic_tdcr; unsigned int apic_thmr; } apic_pm_state; @@ -658,7 +657,6 @@ int lapic_suspend(void) apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0); apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1); apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR); - apic_pm_state.apic_tmict = apic_read(APIC_TMICT); apic_pm_state.apic_tdcr = apic_read(APIC_TDCR); if (maxlvt >= 5) apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); @@ -718,7 +716,7 @@ int lapic_resume(void) apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc); apic_write(APIC_LVTT, apic_pm_state.apic_lvtt); apic_write(APIC_TDCR, apic_pm_state.apic_tdcr); - apic_write(APIC_TMICT, apic_pm_state.apic_tmict); + apic_write(APIC_TMICT, 0); apic_write(APIC_ESR, 0); apic_read(APIC_ESR); apic_write(APIC_LVTERR, apic_pm_state.apic_lvterr);
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |