[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts



On 19/01/18 13:25, Jan Beulich wrote:
>>>> On 18.01.18 at 16:46, <andrew.cooper3@xxxxxxxxxx> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1743,6 +1743,29 @@ void context_switch(struct vcpu *prev, struct vcpu 
>> *next)
>>          }
>>  
>>          ctxt_switch_levelling(next);
>> +
>> +        if ( opt_ibpb )
>> +        {
>> +            static DEFINE_PER_CPU(unsigned int, last_nonidle);
>> +            unsigned int *last_id = &this_cpu(last_nonidle);
> "nonidle" is not entirely correct without an is_idle_...() check around
> it, as the outer condition leaves room for idle vCPU-s to make it here.
> But take this as a remark, not a strict request to change the name.

If you can suggest a better name, I'll use it.

>
>> +            /* Squash the domid and vcpu id together for efficiency. */
>> +            unsigned int next_id = (((unsigned int)nextd->domain_id << 16) |
>> +                                    (uint16_t)next->vcpu_id);
> Is this really more efficient than just comparing struct vcpu pointers?

vcpu pointers are rather more susceptible to false aliasing in the case
that the 4k memory allocation behind struct vcpu gets reused.

The risks are admittedly minute, but this is a much safer option.

>
>> +            BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff);
>> +
>> +            /*
>> +             * When scheduling from a vcpu, to idle, and back to the same 
>> vcpu
>> +             * (which might be common in a lightly loaded system, or when
>> +             * using vcpu pinning), there is no need to issue IBPB, as we 
>> are
>> +             * returning to the same security context.
>> +             */
>> +            if ( *last_id != next_id )
>> +            {
>> +                wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
>> +                *last_id = next_id;
>> +            }
> I've read David's mails regarding this another time, but I still can't
> conclude why this extra logic would be necessary. Transitions
> from a guest vCPU through idle and back to that very vCPU do
> not alter per_cpu(curr_vcpu, ...) - __context_switch() is the
> only place to update it. There's certainly the potential for it to
> be called through __sync_local_execstate(), but is that a
> common case? I'd support introduction of the extra logic only if
> so.
>
> Furthermore, if this indeed was a sufficiently common case, doing
> lazy context switches at all for HVM guests would once again
> need to be put under question.

David found that transitions to idle and back to the same vcpu were
reliably taking an unnecessary IBPB.

I'm deeply suspicious that, given a TLB shootdown IPI interrupting
half-idle will switch to full idle, lazy context switching is actually a
win (as half-idle still remains set in a domains dirty mask).  Andy Luto
dropped lazyfpu from linux a while ago, proving that was worse in most
cases even for regular processes.

I certainly don't think that trapping #NM for HVM vcpu lazy fpu is
clever on any hardware.

Just one of the many things to investigate in copious free time.

>
>> --- a/xen/arch/x86/spec_ctrl.c
>> +++ b/xen/arch/x86/spec_ctrl.c
>> @@ -33,6 +33,7 @@ static enum ind_thunk {
>>      THUNK_JMP,
>>  } opt_thunk __initdata = THUNK_DEFAULT;
>>  static int opt_ibrs __initdata = -1;
>> +bool __read_mostly opt_ibpb = true;
>>  static bool opt_rsb_native __initdata = true, opt_rsb_vmexit __initdata = 
>> true;
> Considering the (better imo) placement of __read_mostly here,
> would you mind moving the __initdata accordingly in the earlier
> patches (opt_thunk having reasons to be an exception)?
> Otherwise please move the __read_mostly to the end.

Ok.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.