[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 06/11] x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point



On 19/01/18 11:43, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@xxxxxxxxxx> wrote:
>> We need to be able to either set or clear IBRS in Xen context, as well as
>> restore appropriate guest values in guest context.  See the documentation in
>> asm-x86/spec_ctrl_asm.h for details.
>>
>> Writes to %cr3 are slower when SPEC_CTRL.IBRS is set, so the
>> SPEC_CTRL_{ENTRY/EXIT}* positioning is important, and optimised for the
>> expected common case of Xen not using IBRS itself.
> And this expectation is because on pre-Skylake we're fine using just
> retpoline, and we expect post-Skylake to not have the issue anymore?
> Such reasoning would help being spelled out here.

Kaby Lake is in the same bucket as Skylake.  Coffee Lake and Cannon Lake
are unknown quantities.

Intel have stated that the expect these issues to be resolved on the
same hardware as CET, as retpoline is a deliberate ROP-gadget and won't
function with Controlflow Enforcement Technology enabled.

As far as I'm aware, CET is expected in Ice Lake, but who knows what
this issue has done to Intel's release schedule.

> As to the behavior of CR3 writes - is this written down anywhere in
> Intel's and/or AMD's docs? Or else, how do you know this is uniformly
> the case on all past, current, and future hardware?

AMD don't implement IBRS, so this isn't applicable to them.

I'm not sure if this is stated in any public documentation.  I'll see
about poking people...

Irrespective, it can be demonstrated trivially with some timing analysis.

>> @@ -99,6 +106,10 @@ UNLIKELY_END(realmode)
>>  .Lvmx_vmentry_fail:
>>          sti
>>          SAVE_ALL
>> +
>> +        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
> I think the use of the PV variant here requires a comment.

Oh.  It used to have one...  I'll try to find it.

>
>> @@ -142,6 +146,13 @@ ENTRY(compat_restore_all_guest)
>>          .popsection
>>          or    $X86_EFLAGS_IF,%r11
>>          mov   %r11d,UREGS_eflags(%rsp)
>> +
>> +        mov VCPU_arch_msr(%rbx), %rax
>> +        mov VCPUMSR_spec_ctrl_raw(%rax), %eax
> I understand that it's code like this which is missing from the SVM
> and VMX exit paths.

Correct.

>
>> @@ -729,6 +760,9 @@ ENTRY(nmi)
>>  handle_ist_exception:
>>          SAVE_ALL CLAC
>>  
>> +        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, Clob: acd */
>> +        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
> Following my considerations towards alternative patching to
> eliminate as much overhead as possible from the Meltdown
> band-aid in case it is being disabled, I'm rather hesitant to see any
> patchable code being introduced into the NMI/#MC entry paths
> without the patching logic first being made safe in this regard.
> Exceptions coming here aren't very frequent (except perhaps on
> hardware about to die), so the path isn't performance critical.
> Therefore I think we should try to avoid any patching here, and
> just conditionals instead. This in fact is one of the reasons why I
> didn't want to macro-ize the assembly additions done in the
> Meltdown band-aid.
>
> I do realize that this then also affects the exit-to-Xen path,
> which I agree is less desirable to use conditionals on.

While I agree that our lack of IST-safe patching is a problem, these
alternative points are already present on the NMI and MCE paths, and
need to be.  As a result, the DF handler is in no worse of a position. 
As a perfect example, observe the CLAC in context.

I could perhaps be talked into making a SPEC_CTRL_ENTRY_FROM_IST variant
which doesn't use alternatives (but IMO this is pointless in the
presence of CLAC), but still don't think it is reasonable to treat DF
differently to NMI/MCE.

>
>> --- /dev/null
>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>> @@ -0,0 +1,227 @@
>> +/******************************************************************************
>> + * include/asm-x86/spec_ctrl.h
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Copyright (c) 2017-2018 Citrix Systems Ltd.
>> + */
>> +
>> +#ifndef __X86_SPEC_CTRL_ASM_H__
>> +#define __X86_SPEC_CTRL_ASM_H__
>> +
>> +#ifdef __ASSEMBLY__
>> +#include <asm/msr.h>
>> +
>> +/*
>> + * Saving and restoring MSR_SPEC_CTRL state is a little tricky.
>> + *
>> + * We want the guests choice of SPEC_CTRL while in guest context, and IBRS
>> + * (set or clear, depending on the hardware) while running in Xen context.
> To me this continues to be somewhat strange to read. One possibility
> to improve it would be to move the opening parenthesis after "set or
> clear", another would be to say "..., and Xen's choice (set or ...".
> But you're the native speaker, so I'm open to other alternatives you
> may consider even better.

Your suggestion is clearer.  I'll borrow that.

>> + * Therefore, a simplistic algorithm is:
>> + *
>> + *  - Set/clear IBRS on entry to Xen
>> + *  - Set the guests' choice on exit to guest
>> + *  - Leave SPEC_CTRL unchanged on exit to xen
>> + *
>> + * There are two complicating factors:
>> + *  1) HVM guests can have direct access to the MSR, so it can change
>> + *     behind Xen's back.
>> + *  2) An NMI or MCE can interrupt at any point, including early in the 
>> entry
>> + *     path, or late in the exit path after restoring the guest value.  This
>> + *     will corrupt the guest value.
>> + *
>> + * Factor 1 is dealt with by relying on NMIs/MCEs being blocked immediately
>> + * after VMEXIT.  The VMEXIT-specific code reads MSR_SPEC_CTRL and updates
>> + * current before loading Xen's MSR_SPEC_CTRL setting.
>> + *
>> + * Factor 2 is harder.  We maintain a shadow_spec_ctrl value, and
>> + * use_shadow_spec_ctrl boolean per cpu.  The synchronous use is:
>> + *
>> + *  1) Store guest value in shadow_spec_ctrl
>> + *  2) Set use_shadow_spec_ctrl boolean
>> + *  3) Load guest value into MSR_SPEC_CTRL
>> + *  4) Exit to guest
>> + *  5) Entry from guest
>> + *  6) Clear use_shadow_spec_ctrl boolean
> 7) Load Xen value into MSR_SPEC_CTRL
>
> (It is important to spell out that the MSR write in _both_ cases
> needs to come after the write of the boolean.)

Ok.

>
>> +.macro DO_SPEC_CTRL_ENTRY_FROM_VMEXIT ibrs_val:req
>> +/*
>> + * Requires %rbx=current, %rsp=regs/cpuinfo
>> + * Clobbers %rax, %rcx, %rdx
>> + *
>> + * The common case is that a guest has direct access to MSR_SPEC_CTRL, at
>> + * which point we need to save the guest value before setting IBRS for Xen.
>> + * Unilaterally saving the guest value is shorter and faster than checking.
>> + */
>> +    mov $MSR_SPEC_CTRL, %ecx
>> +    rdmsr
>> +
>> +    /* Stash the value from hardware. */
>> +    mov VCPU_arch_msr(%rbx), %rdx
>> +    mov %al, VCPUMSR_spec_ctrl_raw(%rdx)
> Patch 3 makes this a 32-bit field now - why still %al?

Oversight.

>
>> +.macro DO_SPEC_CTRL_ENTRY maybexen:req ibrs_val:req
>> +/*
>> + * Requires %rsp=regs (also cpuinfo if !maybexen)
>> + * Clobbers %rax, %rcx, %rdx
>> + *
>> + * PV guests can't update MSR_SPEC_CTRL behind Xen's back, so no need to 
>> read
>> + * it back.  Entries from guest context need to clear SPEC_CTRL shadowing,
>> + * while entries from Xen must leave shadowing in its current state.
>> + */
>> +    mov $MSR_SPEC_CTRL, %ecx
>> +
>> +    .if \maybexen
>> +        testb $3, UREGS_cs(%rsp)
>> +        jz .L\@_entry_from_xen
>> +    .endif
>> +
>> +    /*
>> +     * Clear SPEC_CTRL shadowing *before* loading Xen's value.  If entering
>> +     * from a possibly-xen context, %rsp doesn't necessarily alias the 
>> cpuinfo
>> +     * block so calculate the position directly.
>> +     */
>> +    .if \maybexen
>> +        GET_STACK_END(dx)
> In almost all cases (and perhaps in all ones with maybexen set) the
> macro invocation sits right ahead of a GET_STACK_BASE(). If you
> swapped them and allowed the register to be passed in, you could
> avoid this load.

I seem to recall this being problematic.  I'll recheck.

>
>> +.macro DO_SPEC_CTRL_EXIT_TO_GUEST
>> +/*
>> + * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
>> + * Clobbers %rax, %rcx, %rdx
> %rax isn't really being clobbered anymore. The clobber annotations
> on all use sites appear to be similarly stale.

True.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.