[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen



On 19/01/18 12:47, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@xxxxxxxxxx> wrote:
>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can be 
>> used to force or
>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>  functionality is still set up so IBRS can be virtualised for guests.
>>  
>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune when 
>> the
>> +RSB gets overwritten.  There are individual controls for an entry from HVM
>> +context, and an entry from a native (PV or Xen) context.
> Would you mind adding a sentence or two to the description making
> clear what use this fine grained control is? I can't really figure why I
> might need to be concerned about one of the two cases, but not the
> other.

I though I'd covered that in the commit message, but I'm not sure this
is a suitable place to discuss the details.  PV and HVM guests have
different reasoning for why we need to overwrite the RSB.

In the past, there used to be a default interaction of rsb_native and
SMEP, but that proved to be insufficient and rsb_native is now
unconditionally enabled.  In principle however, it should fall within
CONFIG_PV.

>
>> --- a/xen/arch/x86/spec_ctrl.c
>> +++ b/xen/arch/x86/spec_ctrl.c
>> @@ -33,6 +33,7 @@ static enum ind_thunk {
>>      THUNK_JMP,
>>  } opt_thunk __initdata = THUNK_DEFAULT;
>>  static int opt_ibrs __initdata = -1;
>> +static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = 
>> true;
> Would you mind splitting this into two separate declarations? The
> double use of __initdata is necessary, but looks odd.
>
> Also the latest here it becomes clear that, while minor, opt_ibrs
> would better be int8_t.

Ok.

>
>> @@ -243,6 +250,29 @@ void __init init_speculation_mitigations(void)
>>              setup_force_cpu_cap(X86_FEATURE_XEN_IBRS_CLEAR);
>>      }
>>  
>> +    /*
>> +     * PV guests can poison the RSB to any virtual address from which
>> +     * they can execute a call instruction.  This is necessarily outside
>> +     * of the Xen supervisor mappings.
>> +     *
>> +     * With SMEP enabled, the processor won't speculate into user
>> +     * mappings.  Therefore, we don't need to worry about poisoned
>> +     * entries from 64bit PV guests.
> "... in this case" (i.e. when SMEP is enabled).

Ok.

>
>> --- a/xen/include/asm-x86/spec_ctrl_asm.h
>> +++ b/xen/include/asm-x86/spec_ctrl_asm.h
>> @@ -73,6 +73,40 @@
>>   *  - SPEC_CTRL_EXIT_TO_GUEST
>>   */
>>  
>> +.macro DO_OVERWRITE_RSB
>> +/*
>> + * Requires nothing
>> + * Clobbers %rax, %rcx
>> + *
>> + * Requires 256 bytes of stack space, but %rsp has no net change. Based on
>> + * Google's performance numbers, the loop is unrolled to 16 iterations and 
>> two
>> + * calls per iteration.
>> + *
>> + * The call filling the RSB needs a nonzero displacement, but we use "1:
>> + * pause, jmp 1b" to safely contains any ret-based speculation, even if the
>> + * loop is speculatively executed prematurely.
> I'm struggling to understand why you use "but" here. Maybe just a
> lack of English skills on my part?

"displacement.  A nop would do, but" ?

It is a justification for why we are putting more than a single byte in
the middle.

>
>> + * %rsp is preserved by using an extra GPR because a) we've got plenty 
>> spare,
>> + * b) the two movs are shorter to encode than `add $32*8, %rsp`, and c) can 
>> be
>> + * optimised with mov-elimination in modern cores.
>> + */
>> +    mov $16, %ecx   /* 16 iterations, two calls per loop */
>> +    mov %rsp, %rax  /* Store the current %rsp */
>> +
>> +.L\@_fill_rsb_loop:
>> +
>> +    .rept 2         /* Unrolled twice. */
>> +    call 2f         /* Create an RSB entry. */
>> +1:  pause
>> +    jmp 1b          /* Capture rogue speculation. */
>> +2:
> I won't further insist on changing away from numeric labels here, but
> I'd still like to point out an example of a high risk use of such labels in
> mainline code: There's a "jz 1b" soon after
> exception_with_ints_disabled, leading across _two_ other labels and
> quite a few insns and macro invocations. May I at the very least
> suggest that you don't use 1 and 2 here?

I spent ages trying to get .L labels working here, but they don't
function inside a rept, as you end up with duplicate local symbols.

Even using irp to inject a unique number into the loop doesn't appear to
work, because the \ escape gets interpreted as a token separator. 
AFAICT, \@ is special by virtue of the fact that it doesn't count as a
token separator.

If you've got a better suggestion then I'm all ears.

Alternatively, I could manually unroll the loop, or pick some arbitrary
other numbers to use.

>
>> +    .endr
>> +
>> +    sub $1, %ecx
>> +    jnz .L\@_fill_rsb_loop
> Iirc it's only CMP and TEST which can be fused with Jcc. Would
> it perhaps help slightly to move the SUB ahead of the two CALLs?
> And once they are farther apart, perhaps DEC wouldn't be
> that bad a choice anymore?

Nehalem, as well as Builldozer and later can only fuse CMP and TEST. 
SandyBridge and later can fuse ADD, SUB, AND, INC and DEC.

Fusibility is also dependent on the condition code, but JNZ can fuse
with any of the options.

It is still unclear to what effect the INC/DEC flags merge stall
matters, but if they were better than their ADD/SUB alternatives, I'd
expect compilers to actually emit them.

>
>> +    mov %rax, %rsp  /* Retore old %rsp */
> Restore

Oops.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.