Xen project Mailing List

Re: [RFC] docs: FRED support in Xen

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 7 Jan 2025 12:11:51 +0100

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 07 Jan 2025 11:12:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.01.2025 17:06, Andrew Cooper wrote: > On 06/01/2025 2:28 pm, Jan Beulich wrote: >> On 03.01.2025 21:47, Andrew Cooper wrote: >>> +There are several uses of the vm86 fields in Xen: >>> + >>> + #. ``struct vcpu`` embeds a ``struct cpu_user_regs`` to hold GPRs/etc when >>> + the vCPU is scheduled out. The vm86 fields are used by the PV logic >>> only >>> + (``{save,load}_segments()``) and can be moved into separate fields in >>> + ``struct pv_vcpu``. PV's ``dom0_construct()`` sets these fields >>> directly, >>> + and needs a matching adjustment. >>> + >>> + #. As part of ``arch_{get,set}_info_guest()`` during hypercalls. The >>> + guest side needs to remain as-is, but the Xen side can rearranged to >>> use >>> + the new fields from above. >>> + >>> + #. As part of vCPU diagnostics (``show_registers()`` etc). The ``#DF`` >>> path >>> + uses the fields as scratch storage for the current register values, >>> while >>> + the other diagnostics are simply accessing the state of a scheduled-out >>> + vCPU. >> Unlike for the former 2 points and for the one immediately below, but like >> for >> the final one below you don't mention what you intend to do. Here I assume >> it'll >> be reasonably straightforward to use scratch space elsewhere. > > https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-fred > is my working branch if you want to peek at things. > > The diagnostics are handled by: > > 1) "x86/traps: Rework register state printing to use an extra_state struct" > 2) "x86/traps: Avoid OoB accesses to print the data selectors" Doesn't this one remove the sole caller of read_sregs(), hence wanting to also purge the function itself? Apart from this ... > 3) "Revert "x86/traps: 'Fix' safety of read_registers() in #DF path"" ... these all look fine to me; I'll wait with a formal R-b though until the actual submission. > Something else you might want to proactively look at. "x86/idt: > Generate bsp_idt[] at build time". I figured out how to construct the > IDT at build time, without using an external tool to format the table, > and only some slightly disgusting linker script hackery. Clever. >>> +Stack layout >>> +"""""""""""" >>> + >>> +Xen's CPU stacks are 8-page (8-page aligned), arranged as:: >>> + >>> + 7 - Primary stack (with a struct cpu_info at the top) >>> + 6 - Primary stack >>> + 5 - Primary Shadow Stack (read-only) >>> + 4 - #DF IST stack >>> + 3 - #DB IST stack >>> + 2 - NMI IST stack >>> + 1 - #MC IST stack >>> + 0 - IST Shadow Stacks (4x 1k, read-only) >>> + >>> +which needs mapping into FREDs Stack Levels. >>> + >>> +FRED Stack Levels replace IST. Most events from Ring3 enter Ring0 at SL0, >>> +including interrupts, and even exceptions with a non-zero Stack Level >>> +configured. Nested exceptions originate from Ring0 even if they were >>> trying >>> +to push a Ring3 event frame onto the stack, so do follow the Ring0 CSL >>> rules. >>> + >>> +Within Ring0, a stack switch occurs on event delivery if the event has a >>> +higher configured Stack Level (exceptions in ``MSR_FRED_STK_LVLS``, >>> interrupts >>> +in ``MSR_FRED_CONFIG``). Otherwise, the new event is delivered on the >>> current >>> +stack. >>> + >>> +Under FRED, most sources of ``#DF`` are gone; failure to push a new event >>> +frame onto a stack is the main remaining one, so ``#DF`` needs to be the >>> +highest stack level (SL3) to catch errors at all other stack levels. >>> + >>> +Also, FRED removes the "syscall gap", removing the primary need for >>> ``NMI``, >>> +``#DB`` and ``#MC`` to need separate stacks. >>> + >>> +Therefore, Xen has no need for SL1 or SL2. Under IDT delivery, we poison >>> the >>> +unused stack pointers with a non-canonical address, but we cannot do that >>> +under FRED; they're held in MSRs and checked at WRMSR time. Instead, we >>> can >>> +point the SL pairs (RSP + SSP) at each others (regular and shadow stack) >>> guard >>> +pages such that any use of an unused SL will escalate to ``#DF``. >> I may have a language issue here: "each others" reads wrong to me; do you >> perhaps >> mean "each ones"? > > It's poor grammar, but not wrong per say. I'll try to find a different > way to phrase it. > >> >> Further, mainly to double check my own understanding: Almost half of the >> stack >> area then isn't used anymore when FRED is active: 2 pages for the primary >> stack, >> 1 page for the primary shadow stack, 1 page for the SL3 stack, and (somewhat >> wastefully) 1 more for the SL3 shadow stack. > > This matches my understanding (on the proviso that I've still not wired > up the stack handling yet. Still cleaning up the initialisation paths.) > >> There'll be 3 unused pages, and >> hence space again to have true guard pages, e.g. >> >> 7 - Primary stack (with a struct cpu_info at the top) >> 6 - Primary stack >> 5 - Guard page >> 4 - Primary Shadow Stack (read-only) >> 3 - Guard page >> 2 - #DF stack >> 1 - #DF Shadow Stack (read-only) >> 0 - Guard page > > Shadow stack frames are perfectly good guard pages for regular stacks, > and vice versa. That's why I set them up as shadow stack pages even > when CET isn't active. "Perfectly valid" isn't quite true: These pages being readable means writes below the stack bottom (likely the prevailing kind of problem) are detected, but reads wouldn't be. > And yes, we could rearrange the stack. But, there's also a good reason > not to. Our code has to cope with both IDT and FRED layouts, which is > much easier if they're the same. I certainly can see the value of keeping stack layout uniform. >> Having reached the bottom of the section, there's one special IST aspect that >> I'm missing, special enough imo to warrant mentioning even if only to state >> that >> it's (hopefully) going to be irrelevant (i.e. not require replacing by >> another >> similar hack): {dis,en}able_each_ist() as used by SVM (on the assumption that >> sooner or later AMD is likely to also implement FRED, and that you may >> already >> know of details of their respective VMCB integration). > > AMD haven't said anything about FRED yet, despite a very large number of > software partners asking about it. > > However, If AMD were to do FRED, I'd expect it to work just like CET > does today, seeing as there's a proper host/guest split of CR4, and > everything else is in MSRs the guest can't touch. As in "can be prevented to touch directly", aiui. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.