|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC] docs: FRED support in Xen
On 06.01.2025 17:06, Andrew Cooper wrote:
> On 06/01/2025 2:28 pm, Jan Beulich wrote:
>> On 03.01.2025 21:47, Andrew Cooper wrote:
>>> +There are several uses of the vm86 fields in Xen:
>>> +
>>> + #. ``struct vcpu`` embeds a ``struct cpu_user_regs`` to hold GPRs/etc when
>>> + the vCPU is scheduled out. The vm86 fields are used by the PV logic
>>> only
>>> + (``{save,load}_segments()``) and can be moved into separate fields in
>>> + ``struct pv_vcpu``. PV's ``dom0_construct()`` sets these fields
>>> directly,
>>> + and needs a matching adjustment.
>>> +
>>> + #. As part of ``arch_{get,set}_info_guest()`` during hypercalls. The
>>> + guest side needs to remain as-is, but the Xen side can rearranged to
>>> use
>>> + the new fields from above.
>>> +
>>> + #. As part of vCPU diagnostics (``show_registers()`` etc). The ``#DF``
>>> path
>>> + uses the fields as scratch storage for the current register values,
>>> while
>>> + the other diagnostics are simply accessing the state of a scheduled-out
>>> + vCPU.
>> Unlike for the former 2 points and for the one immediately below, but like
>> for
>> the final one below you don't mention what you intend to do. Here I assume
>> it'll
>> be reasonably straightforward to use scratch space elsewhere.
>
> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-fred
> is my working branch if you want to peek at things.
>
> The diagnostics are handled by:
>
> 1) "x86/traps: Rework register state printing to use an extra_state struct"
> 2) "x86/traps: Avoid OoB accesses to print the data selectors"
Doesn't this one remove the sole caller of read_sregs(), hence wanting to also
purge the function itself? Apart from this ...
> 3) "Revert "x86/traps: 'Fix' safety of read_registers() in #DF path""
... these all look fine to me; I'll wait with a formal R-b though until the
actual submission.
> Something else you might want to proactively look at. "x86/idt:
> Generate bsp_idt[] at build time". I figured out how to construct the
> IDT at build time, without using an external tool to format the table,
> and only some slightly disgusting linker script hackery.
Clever.
>>> +Stack layout
>>> +""""""""""""
>>> +
>>> +Xen's CPU stacks are 8-page (8-page aligned), arranged as::
>>> +
>>> + 7 - Primary stack (with a struct cpu_info at the top)
>>> + 6 - Primary stack
>>> + 5 - Primary Shadow Stack (read-only)
>>> + 4 - #DF IST stack
>>> + 3 - #DB IST stack
>>> + 2 - NMI IST stack
>>> + 1 - #MC IST stack
>>> + 0 - IST Shadow Stacks (4x 1k, read-only)
>>> +
>>> +which needs mapping into FREDs Stack Levels.
>>> +
>>> +FRED Stack Levels replace IST. Most events from Ring3 enter Ring0 at SL0,
>>> +including interrupts, and even exceptions with a non-zero Stack Level
>>> +configured. Nested exceptions originate from Ring0 even if they were
>>> trying
>>> +to push a Ring3 event frame onto the stack, so do follow the Ring0 CSL
>>> rules.
>>> +
>>> +Within Ring0, a stack switch occurs on event delivery if the event has a
>>> +higher configured Stack Level (exceptions in ``MSR_FRED_STK_LVLS``,
>>> interrupts
>>> +in ``MSR_FRED_CONFIG``). Otherwise, the new event is delivered on the
>>> current
>>> +stack.
>>> +
>>> +Under FRED, most sources of ``#DF`` are gone; failure to push a new event
>>> +frame onto a stack is the main remaining one, so ``#DF`` needs to be the
>>> +highest stack level (SL3) to catch errors at all other stack levels.
>>> +
>>> +Also, FRED removes the "syscall gap", removing the primary need for
>>> ``NMI``,
>>> +``#DB`` and ``#MC`` to need separate stacks.
>>> +
>>> +Therefore, Xen has no need for SL1 or SL2. Under IDT delivery, we poison
>>> the
>>> +unused stack pointers with a non-canonical address, but we cannot do that
>>> +under FRED; they're held in MSRs and checked at WRMSR time. Instead, we
>>> can
>>> +point the SL pairs (RSP + SSP) at each others (regular and shadow stack)
>>> guard
>>> +pages such that any use of an unused SL will escalate to ``#DF``.
>> I may have a language issue here: "each others" reads wrong to me; do you
>> perhaps
>> mean "each ones"?
>
> It's poor grammar, but not wrong per say. I'll try to find a different
> way to phrase it.
>
>>
>> Further, mainly to double check my own understanding: Almost half of the
>> stack
>> area then isn't used anymore when FRED is active: 2 pages for the primary
>> stack,
>> 1 page for the primary shadow stack, 1 page for the SL3 stack, and (somewhat
>> wastefully) 1 more for the SL3 shadow stack.
>
> This matches my understanding (on the proviso that I've still not wired
> up the stack handling yet. Still cleaning up the initialisation paths.)
>
>> There'll be 3 unused pages, and
>> hence space again to have true guard pages, e.g.
>>
>> 7 - Primary stack (with a struct cpu_info at the top)
>> 6 - Primary stack
>> 5 - Guard page
>> 4 - Primary Shadow Stack (read-only)
>> 3 - Guard page
>> 2 - #DF stack
>> 1 - #DF Shadow Stack (read-only)
>> 0 - Guard page
>
> Shadow stack frames are perfectly good guard pages for regular stacks,
> and vice versa. That's why I set them up as shadow stack pages even
> when CET isn't active.
"Perfectly valid" isn't quite true: These pages being readable means
writes below the stack bottom (likely the prevailing kind of problem)
are detected, but reads wouldn't be.
> And yes, we could rearrange the stack. But, there's also a good reason
> not to. Our code has to cope with both IDT and FRED layouts, which is
> much easier if they're the same.
I certainly can see the value of keeping stack layout uniform.
>> Having reached the bottom of the section, there's one special IST aspect that
>> I'm missing, special enough imo to warrant mentioning even if only to state
>> that
>> it's (hopefully) going to be irrelevant (i.e. not require replacing by
>> another
>> similar hack): {dis,en}able_each_ist() as used by SVM (on the assumption that
>> sooner or later AMD is likely to also implement FRED, and that you may
>> already
>> know of details of their respective VMCB integration).
>
> AMD haven't said anything about FRED yet, despite a very large number of
> software partners asking about it.
>
> However, If AMD were to do FRED, I'd expect it to work just like CET
> does today, seeing as there's a proper host/guest split of CR4, and
> everything else is in MSRs the guest can't touch.
As in "can be prevented to touch directly", aiui.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |