[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH AUTOSEL 6.19-5.10] x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set



From: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>

[ Upstream commit db9aded979b491a24871e1621cd4e8822dbca859 ]

The PVH entry is available for 32-bit KVM guests, and 32-bit KVM guests
do not depend on CONFIG_X86_PAE. However, mk_early_pgtbl_32() builds
different pagetables depending on whether CONFIG_X86_PAE is set.
Therefore, enabling PAE mode for 32-bit KVM guests without
CONFIG_X86_PAE being set would result in a boot failure during CR3
loading.

Signed-off-by: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>
Reviewed-by: Juergen Gross <jgross@xxxxxxxx>
Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
Message-ID: 
<d09ce9a134eb9cbc16928a5b316969f8ba606b81.1768017442.git.houwenlong.hwl@xxxxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for a complete analysis.

---

## Detailed Analysis

### 1. Commit Message Analysis

The subject clearly states the problem: PAE mode is being
unconditionally enabled for all PVH guests, but for 32-bit guests it
should only be enabled when `CONFIG_X86_PAE` is set. The commit message
explicitly describes a **boot failure** scenario: loading CR3 fails when
the page table format doesn't match the CPU's paging mode expectation.

The commit is:
- Reviewed by Juergen Gross (Xen subsystem maintainer)
- Signed off by Juergen Gross (subsystem maintainer sign-off)

### 2. Code Change Analysis

**The Bug Mechanism:**

The PVH entry path in `arch/x86/platform/pvh/head.S` is the boot entry
point for PVH (Para-Virtualized Hardware) guests, used by Xen and KVM.
The flow is:

1. **Line 94-97** (before fix): PAE mode is unconditionally enabled in
   CR4:

```94:97:arch/x86/platform/pvh/head.S
        /* Enable PAE mode. */
        mov %cr4, %eax
        orl $X86_CR4_PAE, %eax
        mov %eax, %cr4
```

2. For **64-bit** guests (`CONFIG_X86_64`), this is correct — PAE is
   always needed as a prerequisite for long mode (line 99-104).

3. For **32-bit** guests (the `#else` path starting at line 196), the
   code calls `mk_early_pgtbl_32()` to build early page tables:

```196:205:arch/x86/platform/pvh/head.S
#else /* CONFIG_X86_64 */

        call mk_early_pgtbl_32

        mov $_pa(initial_page_table), %eax
        mov %eax, %cr3

        mov %cr0, %eax
        or $(X86_CR0_PG | X86_CR0_PE), %eax
        mov %eax, %cr0
```

4. `mk_early_pgtbl_32()` in `arch/x86/kernel/head32.c` builds
   **fundamentally different** page table structures depending on
   `CONFIG_X86_PAE`:

```95:103:arch/x86/kernel/head32.c
#ifdef CONFIG_X86_PAE
typedef pmd_t                   pl2_t;
#define pl2_base                initial_pg_pmd
#define SET_PL2(val)            { .pmd = (val), }
#else
typedef pgd_t                   pl2_t;
#define pl2_base                initial_page_table
#define SET_PL2(val)            { .pgd = (val), }
#endif
```

   - With `CONFIG_X86_PAE`: Builds **3-level PAE page tables**
     (PGDIR_SHIFT=30, uses PMDs + PDPTEs)
   - Without `CONFIG_X86_PAE`: Builds **2-level non-PAE page tables**
     (PGDIR_SHIFT=22, uses PGDs directly)

**The crash**: When PAE is enabled in CR4 but non-PAE page tables are
loaded into CR3, the CPU interprets the 2-level page directory as a PAE
PDPT (Page Directory Pointer Table). When paging is activated
(CR0.PG=1), the processor tries to load the PDPTE entries from the
address in CR3. The non-PAE page directory entries are completely
incompatible with PAE PDPTE format, causing a **#GP fault or triple
fault**, resulting in an immediate boot failure.

**The Fix:** Simply wrapping the PAE enablement with proper `#ifdef`
guards:

```asm
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
        /* Enable PAE mode. */
        mov %cr4, %eax
        orl $X86_CR4_PAE, %eax
        mov %eax, %cr4
#endif
```

This ensures PAE is only enabled when:
- `CONFIG_X86_64` is set (64-bit always needs PAE for long mode), or
- `CONFIG_X86_PAE` is set (32-bit with PAE — page tables match)

Note that the 32-bit path at lines 212-220 already has code to disable
PAE before jumping to `startup_32`, which confirms the original author
was aware that PAE and non-PAE modes exist, but the initial enablement
was not properly guarded.

### 3. Classification

This is a **boot failure fix**. It's not a feature, cleanup, or
optimization. It fixes a configuration where a 32-bit PVH guest without
`CONFIG_X86_PAE` completely fails to boot.

### 4. Scope and Risk Assessment

- **Lines changed**: 2 lines added (`#if defined(...)` and `#endif`), 0
  lines removed
- **Files touched**: 1 (`arch/x86/platform/pvh/head.S`)
- **Complexity**: Minimal — conditional compilation guard
- **Risk**: Extremely low
  - For `CONFIG_X86_64`: No change (the `#if` is always true)
  - For `CONFIG_X86_32` with `CONFIG_X86_PAE`: No change (the `#if` is
    true)
  - For `CONFIG_X86_32` without `CONFIG_X86_PAE`: PAE is no longer
    enabled, matching the page table format — this is the bug fix
- **Regression potential**: Near zero. The only behavioral change is for
  the broken configuration that currently crashes.

### 5. User Impact

- **Who is affected**: Anyone running a 32-bit kernel without PAE as a
  PVH/KVM guest. This is a legitimate configuration since `config PVH`
  has no dependency on `CONFIG_X86_PAE` or `CONFIG_X86_64`.
- **Severity**: Complete boot failure — the system cannot boot at all
- **Workaround**: Users must enable `CONFIG_X86_PAE` for 32-bit PVH
  guests, which may not be obvious and adds unnecessary configuration
  constraints

### 6. Stability Indicators

- **Reviewed-by**: Juergen Gross (Xen subsystem maintainer) — very
  strong indicator
- **Signed-off-by**: Juergen Gross (accepted through the Xen tree)
- The fix is trivially correct by inspection — when `CONFIG_X86_PAE` is
  not set, enabling PAE in CR4 is wrong because the page tables aren't
  in PAE format

### 7. Dependency Check

- **No dependencies** on other patches — this is a completely self-
  contained fix
- The affected code (`pvh_start_xen` with the unconditional PAE
  enablement) exists in **all current stable trees** (confirmed present
  from v5.4 through v6.12)
- The fix context is slightly different in older stable trees (e.g.,
  `rep\n       movsl` vs `rep movsl`, PIC vs absolute addressing), so
  minor backport adjustments may be needed for older trees, but the fix
  itself (adding `#if`/`#endif` around the PAE lines) is trivial to
  adapt

### 8. Conclusion

This is a textbook stable kernel fix:
- **Fixes a real, severe bug**: Complete boot failure for a valid kernel
  configuration
- **Obviously correct**: The fix is a 2-line conditional compilation
  guard that makes PAE enablement match page table format
- **Small and contained**: 2 lines added to 1 file
- **No new features**: Just correctness fix
- **No risk of regression**: Only changes behavior for the currently-
  broken configuration
- **Present in all stable trees**: The bug has existed since the
  original PVH commit (v4.11, 2017)
- **Reviewed by subsystem maintainer**: Juergen Gross

**YES**

 arch/x86/platform/pvh/head.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index 344030c1a81d4..53ee2d53fcf8e 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -91,10 +91,12 @@ SYM_CODE_START(pvh_start_xen)
 
        leal rva(early_stack_end)(%ebp), %esp
 
+#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
        /* Enable PAE mode. */
        mov %cr4, %eax
        orl $X86_CR4_PAE, %eax
        mov %eax, %cr4
+#endif
 
 #ifdef CONFIG_X86_64
        /* Enable Long mode. */
-- 
2.51.0




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.