[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Thu, 25 Jan 2018 16:15:33 +0000
Cc: Dario Faggioli <dfaggioli@xxxxxxxx>, David Woodhouse <dwmw@xxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Thu, 25 Jan 2018 16:17:33 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

CC'ing Dario with a working email address this time...

On 25/01/18 16:09, Andrew Cooper wrote:
> On 25/01/18 15:57, Jan Beulich wrote:
>>>>> On 24.01.18 at 14:12, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> @@ -1743,6 +1744,34 @@ void context_switch(struct vcpu *prev, struct vcpu 
>>> *next)
>>>          }
>>>  
>>>          ctxt_switch_levelling(next);
>>> +
>>> +        if ( opt_ibpb && !is_idle_domain(nextd) )
>> Is the idle domain check here really useful?
> Yes, because as you pointed out in v9, the outer condition isn't
> sufficient to exclude nextd being idle.
>
>>> +        {
>>> +            static DEFINE_PER_CPU(unsigned int, last);
>>> +            unsigned int *last_id = &this_cpu(last);
>>> +
>>> +            /*
>>> +             * Squash the domid and vcpu id together for comparason
>>> +             * efficiency.  We could in principle stash and compare the 
>>> struct
>>> +             * vcpu pointer, but this risks a false alias if a domain has 
>>> died
>>> +             * and the same 4k page gets reused for a new vcpu.
>>> +             */
>>> +            unsigned int next_id = (((unsigned int)nextd->domain_id << 16) 
>>> |
>>> +                                    (uint16_t)next->vcpu_id);
>>> +            BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff);
>>> +
>>> +            /*
>>> +             * When scheduling from a vcpu, to idle, and back to the same 
>>> vcpu
>>> +             * (which might be common in a lightly loaded system, or when
>>> +             * using vcpu pinning), there is no need to issue IBPB, as we 
>>> are
>>> +             * returning to the same security context.
>>> +             */
>>> +            if ( *last_id != next_id )
>>> +            {
>>> +                wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
>>> +                *last_id = next_id;
>>> +            }
>>> +        }
>>>      }
>>>  
>>>      context_saved(prev);
>> Short of any real numbers (or a proper explanation) having been
>> provided, I've done some measurements. Indeed I can see quite
>> high a rate of cases of execution coming this way despite the
>> vCPU not really changing during early boot of HVM guests. This
>> goes down quite a bit later on, but obviously that's also workload
>> dependent. But the number of cases where the barrier emission
>> could be avoided remains non-negligible, so I agree the extra
>> avoidance logic is probably warranted. On that basis (perhaps
>> with the idle check above removed)
>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>>
>> For the record, the overwhelming majority of calls to
>> __sync_local_execstate() being responsible for the behavior
>> come from invalidate_interrupt(), which suggests to me that
>> there's a meaningful number of cases where a vCPU is migrated
>> to another CPU and then back, without another vCPU having
>> run on the original CPU in between. If I'm not wrong with this,
>> I have to question why the vCPU is migrated then in the first
>> place.
> This is a known (mis)feature of the Credit scheduler.  When a PCPU goes
> idle, it aggressively steals work from the adjacent cpus, even if the
> adjacent cpu only has a single vcpu to run and is running it.
>
> XenServer has this gross hack which makes aggregate networking
> measurements far far better
>
> https://github.com/xenserver/xen.pg/blob/XS-7.1.x/master/sched-credit1-use-per-pcpu-runqueue-count.patch
>
> Dario made a different fix to Credit1 upstream which was supposed to
> resolve this behaviour (although I can't locate the patch by a list of
> titles), but based on these observations, I'd say the misfeature hasn't
> been fixed.
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  - From: Dario Faggioli

References:
- [Xen-devel] [PATCH v10 00/11] x86: Mitigations for SP2/CVE-2017-5715/Branch Target Injection
  - From: Andrew Cooper
- [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  - From: Andrew Cooper
- Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
Next by Date: Re: [Xen-devel] [PATCH v10 07/11] x86/entry: Avoid using alternatives in NMI/#MC paths
Previous by thread: Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
Next by thread: Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.