[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 03/22] x86/dom0: only disable SMAP for the PV dom0 build



On Tue, Jul 30, 2024 at 12:06:45PM +0100, Andrew Cooper wrote:
> On 30/07/2024 11:55 am, Roger Pau Monné wrote:
> > On Mon, Jul 29, 2024 at 06:51:31PM +0100, Andrew Cooper wrote:
> >> On 29/07/2024 5:18 pm, Roger Pau Monné wrote:
> >>> On Mon, Jul 29, 2024 at 04:52:22PM +0100, Andrew Cooper wrote:
> >>>> On 29/07/2024 12:53 pm, Jan Beulich wrote:
> >>>>> On 26.07.2024 17:21, Roger Pau Monne wrote:
> >>>>>> The PVH dom0 builder doesn't switch page tables and has no need to run 
> >>>>>> with
> >>>>>> SMAP disabled.
> >>>>>>
> >>>>>> Put the SMAP disabling close to the code region where it's necessary, 
> >>>>>> as it
> >>>>>> then becomes obvious why switch_cr3_cr4() is required instead of
> >>>>>> write_ptbase().
> >>>>>>
> >>>>>> Note removing SMAP from cr4_pv32_mask is not required, as we never 
> >>>>>> jump into
> >>>>>> guest context, and hence updating the value of cr4_pv32_mask is not 
> >>>>>> relevant.
> >>>>> I'm okay-ish with that being dropped, but iirc the goal was to keep the
> >>>>> variable in sync with CPU state.
> >>>> Removing SMAP from cr4_pv32_mask is necessary.
> >>>>
> >>>> Otherwise IST vectors will reactive SMAP behind the back of the 
> >>>> dombuilder.
> >>>>
> >>>> This will probably only manifest in practice in a CONFIG_PV32=y build,
> >>> Sorry, I'm possibly missing some context here.  When running the dom0
> >>> builder we switch to the guest page-tables, but not to the guest vCPU,
> >>> (iow: current == idle) and hence the context is always the Xen
> >>> context.
> >> Correct.
> >>
> >>> Why would the return path of the IST use cr4_pv32_mask when the
> >>> context in which the IST happened was the Xen one, and the current
> >>> vCPU is the idle one (a 64bit PV guest from Xen's PoV).
> >>>
> >>> My understanding is that cr4_pv32_mask should only be used when the
> >>> current context is running a 32bit PV vCPU.
> >> This logic is evil to follow, because you need to look at both
> >> cr4_pv32_mask and XEN_CR4_PV32_BITS to see both halves of it.
> >>
> >> Notice how cr4_pv32_restore() only ever OR's cr4_pv32_mask into %cr4?
> >>
> >> CR4_PV32_RESTORE is called from every entry path which *might* have come
> >> from a 32bit PV guest, and it always results in Xen having SMEP/SMAP
> >> active (as applicable).  This includes NMI.
> >>
> >> The change is only undone in compat_restore_all_guest(), where
> >> XEN_CR4_PV32_BITS is cleared from %cr4 iff returning to Ring1/2.  This
> >> is logic cunningly disguised in the use of the Parity flag.
> >>
> >>
> >> Because the NMI handler does reactive SMEP/SMAP (based on the value in
> >> cr4_pv32_mask), and returning to Xen does not pass through
> >> compat_restore_all_guest(), taking an NMI in the middle of of the
> >> dombuilder will reactive SMAP behind your back.
> > After further conversations with Andrew we believe the current
> > disabling of X86_CR4_SMAP in %cr4 during dom0 build is not safe.
> >
> > Regardless of whether cr4_pv32_mask is properly adjusted return to Xen
> > context from interrupt would be done with SMAP enabled if
> > X86_FEATURE_XEN_SMAP is set.
> 
> Sorry - that's not what I intended to convey.
> 
> The logic prior to this patch is safe.  SMAP is cleared from
> cr4_pv32_mask before clearing CR4.SMAP, and reinstated in the opposite
> order.  Therefore, an NMI hitting the region won't reactivate SMAP
> because it's not (instantaniously) set in cr4_pv32_mask.

My bad, I was getting confused with the `clac` instructions in the
event entry points.

I think with my proposed change we would also hit the BUG in
cr4_pv32_restore on debug builds if Xen got an interrupt in the middle
of the SMAP disabled dom0 build region if SMEP is enabled, as
cr4_pv32_mask would differ from the current %cr4 value (not all bits
intended would be actually set).

> Arguably it wants some barrier()'s for clarity, and an explanation of
> why this works.
> 
> The problem your patch has is that by not clearing SMAP from
> cr4_pv32_mask, it becomes unsafe iff an NMI/#MC/#DB hits the region.

Won't such issue also affect common_interrupt, and hence not be
limited to NMI/#MC/#DB only?

Regards, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.