[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen



On 4/28/22 10:50 AM, Jan Beulich wrote:
The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
with pat_enabled()") pat_enabled() returning false (because of PAT
initialization being suppressed in the absence of MTRRs being announced
to be available) has become a problem: The i915 driver now fails to
initialize when running PV on Xen (i915_gem_object_pin_map() is where I
located the induced failure), and its error handling is flaky enough to
(at least sometimes) result in a hung system.

Yet even beyond that problem the keying of the use of WC mappings to
pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
graphics frame buffer accesses would have been quite a bit less
performant than possible.

Arrange for the function to return true in such environments, without
undermining the rest of PAT MSR management logic considering PAT to be
disabled: Specifically, no writes to the PAT MSR should occur.

For the new boolean to live in .init.data, init_cache_modes() also needs
moving to .init.text (where it could/should have lived already before).

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
On the system where I observed the issue, a knock-on effect of driver
initialization failing was that the SATA-controller also started to
report failures.

--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -62,6 +62,7 @@
static bool __read_mostly pat_bp_initialized;
  static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
+static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
  static bool __read_mostly pat_bp_enabled;
  static bool __read_mostly pat_cm_initialized;
@@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
  static int __init nopat(char *str)
  {
        pat_disable("PAT support disabled via boot option.");
+       pat_force_disabled = true;
        return 0;
  }
  early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
        wrmsrl(MSR_IA32_CR_PAT, pat);
  }
-void init_cache_modes(void)
+void __init init_cache_modes(void)
  {
        u64 pat = 0;
@@ -313,6 +315,13 @@ void init_cache_modes(void)
                 */
                pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
                      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+       } else if (!pat_force_disabled &&
+                  boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+               /*
+                * Clearly PAT is enabled underneath. Allow pat_enabled() to
+                * reflect this.
+                */
+               pat_bp_enabled = true;
        }
__init_cache_modes(pat);


Hi Jan,

I tested this patch on my system with an Intel Haswell CPU with
integrated GPU. It fixes my system which currently is unable
to boot the official Linux 5.17.y as a Dom0 on Xen.

The first suggestion I have is that the commit message should be
edited to explain how the behavior of the kernel in response to
the nopat option would be changed by this patch. My tests indicate
that this patch changes what nopat actually does, and the commit
message should clearly explain that fact.

Thank you for this patch, I hope it is committed soon and backported
to 5.17 stable. I consider that it fixes a regression caused by
bdd8b6c98239, so my second suggestion is that you add Fixes:

Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

I will be using this patch on my system until it, or another fix, is
committed to Linux. Without such a fix, I cannot run 5.17.y and later
on my system as a Xen Dom0.

Best regards,

Chuck



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.