[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

S3 regression related to XSA-471 patches



Hi,

We've got several reports that S3 reliability recently regressed. We
identified it's definitely related to XSA-471 patches, and bisection
points at "x86/idle: Remove broken MWAIT implementation". I don't have
reliable reproduction steps, so I'm not 100% sure if it's really this
patch, or maybe an earlier one - but it's definitely already broken at
this point in the series. Most reports are about Xen 4.17 (as that's
what stable Qubes OS version currently use), but I think I've seen
somebody reporting the issue on 4.19 too (but I don't have clear
evidence, especially if it's the same issue).

The problem manifests in system freezing on S3 resume. Sometimes it
manages to show the screenlocker password prompt, and sometimes one can
interact with it for a second or two. But then it freezes, mouse stops
moving etc (but no reboot).
One time I managed to get pass the screenlocker and interact with dom0
for a few minutes before it frozen. Resuming domUs didn't happen (the
qubes-specific script doing so resume hanged), and also no logs
persisted on the disk from this case (on disk it looked like it never
resumed). Generally it looked like some CPUs were stuck.

It appears to be more likely to hit the issue if some domUs are active
at the suspend/resume time. While Qubes OS does suspend (not just pause)
them for the host S3 time, some activity before/after does appear to
matter. My test case that has ~30-40% reproduction rate involves several
firefox instances playing youtube videos.

I've talked with Andrew about it a bit, with not much conclusions.
Initial reports mentioned only MTL and RPL systems, so we focused on
something related to weird topology. But just today I've got a report
of the same happening on KBL too...

Another observation (possibly invalidated by today's report...) is that
all reports were about systems running Coreboot (but not only Dasharo
flavor - at least one was Star Labs). 

Most reports are collected at 
https://github.com/QubesOS/qubes-issues/issues/10110

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.