[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 09/11] x86/ctxt: Issue a speculation barrier between vcpu contexts
CC'ing Dario with a working email address this time... On 25/01/18 16:09, Andrew Cooper wrote: > On 25/01/18 15:57, Jan Beulich wrote: >>>>> On 24.01.18 at 14:12, <andrew.cooper3@xxxxxxxxxx> wrote: >>> @@ -1743,6 +1744,34 @@ void context_switch(struct vcpu *prev, struct vcpu >>> *next) >>> } >>> >>> ctxt_switch_levelling(next); >>> + >>> + if ( opt_ibpb && !is_idle_domain(nextd) ) >> Is the idle domain check here really useful? > Yes, because as you pointed out in v9, the outer condition isn't > sufficient to exclude nextd being idle. > >>> + { >>> + static DEFINE_PER_CPU(unsigned int, last); >>> + unsigned int *last_id = &this_cpu(last); >>> + >>> + /* >>> + * Squash the domid and vcpu id together for comparason >>> + * efficiency. We could in principle stash and compare the >>> struct >>> + * vcpu pointer, but this risks a false alias if a domain has >>> died >>> + * and the same 4k page gets reused for a new vcpu. >>> + */ >>> + unsigned int next_id = (((unsigned int)nextd->domain_id << 16) >>> | >>> + (uint16_t)next->vcpu_id); >>> + BUILD_BUG_ON(MAX_VIRT_CPUS > 0xffff); >>> + >>> + /* >>> + * When scheduling from a vcpu, to idle, and back to the same >>> vcpu >>> + * (which might be common in a lightly loaded system, or when >>> + * using vcpu pinning), there is no need to issue IBPB, as we >>> are >>> + * returning to the same security context. >>> + */ >>> + if ( *last_id != next_id ) >>> + { >>> + wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB); >>> + *last_id = next_id; >>> + } >>> + } >>> } >>> >>> context_saved(prev); >> Short of any real numbers (or a proper explanation) having been >> provided, I've done some measurements. Indeed I can see quite >> high a rate of cases of execution coming this way despite the >> vCPU not really changing during early boot of HVM guests. This >> goes down quite a bit later on, but obviously that's also workload >> dependent. But the number of cases where the barrier emission >> could be avoided remains non-negligible, so I agree the extra >> avoidance logic is probably warranted. On that basis (perhaps >> with the idle check above removed) >> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> >> >> For the record, the overwhelming majority of calls to >> __sync_local_execstate() being responsible for the behavior >> come from invalidate_interrupt(), which suggests to me that >> there's a meaningful number of cases where a vCPU is migrated >> to another CPU and then back, without another vCPU having >> run on the original CPU in between. If I'm not wrong with this, >> I have to question why the vCPU is migrated then in the first >> place. > This is a known (mis)feature of the Credit scheduler. When a PCPU goes > idle, it aggressively steals work from the adjacent cpus, even if the > adjacent cpu only has a single vcpu to run and is running it. > > XenServer has this gross hack which makes aggregate networking > measurements far far better > > https://github.com/xenserver/xen.pg/blob/XS-7.1.x/master/sched-credit1-use-per-pcpu-runqueue-count.patch > > Dario made a different fix to Credit1 upstream which was supposed to > resolve this behaviour (although I can't locate the patch by a list of > titles), but based on these observations, I'd say the misfeature hasn't > been fixed. > > ~Andrew > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxxx > https://lists.xenproject.org/mailman/listinfo/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |