Xen project Mailing List

Re: [Xen-devel] [PATCH, RFC] x86/HVM: batch vCPU wakeups

From: Tim Deegan <tim@xxxxxxx>

Date: Tue, 9 Sep 2014 23:29:25 +0200

Cc: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Tue, 09 Sep 2014 21:29:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

At 09:33 +0100 on 09 Sep (1410251617), Jan Beulich wrote: > Mass wakeups (via vlapic_ipi()) can take enormous amounts of time, > especially when many of the remote pCPU-s are in deep C-states. For > 64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware, > accumulated times of over 2ms were observed. Considering that Windows > broadcasts IPIs from its timer interrupt Bleargh. :( > RFC for two reasons: > 1) I started to consider elimination of the event-check IPIs in a more > general way when MONITOR/MWAIT is in use: As long as the remote CPU > is known to be MWAITing (or in the process of resuming after MWAIT), > the write of softirq_pending(cpu) ought to be sufficient to wake it. > This would yield the patch here pointless on MONITOR/MWAIT capable > hardware. That's a promising line. I guess the number of 64-way systems that don't support MONITOR/MWAIT is pretty small. :) > 2) The condition when to enable batching in vlapic_ipi() is already > rather complex, but it is nevertheless unclear whether it would > benefit from further tuning (as mentioned above). The test you chose seems plausible enough to me. I suppose we care enough about single-destination IPIs on idle hardware that we need this check at all; but I doubt that IPIs to exactly two destinations, for example, are worth worrying about. The implementation looks OK in general; a few nits below. > --- a/xen/arch/x86/hvm/vlapic.c > +++ b/xen/arch/x86/hvm/vlapic.c > @@ -447,12 +447,30 @@ void vlapic_ipi( > > default: { > struct vcpu *v; > - for_each_vcpu ( vlapic_domain(vlapic), v ) > + const struct domain *d = vlapic_domain(vlapic); > + bool_t batch = d->max_vcpus > 2 && > + (short_hand > + ? short_hand != APIC_DEST_SELF > + : vlapic_x2apic_mode(vlapic) > + ? dest_mode ? hweight16(dest) > 1 > + : dest == 0xffffffff > + : dest_mode > + ? hweight8(dest & > + GET_xAPIC_DEST_FIELD( > + vlapic_get_reg(vlapic, > + APIC_DFR))) > 1 > + : dest == 0xff); Yoiks. Could you extract some of this into a helper function, say is_unicast_dest(), just for readability? > void cpu_raise_softirq(unsigned int cpu, unsigned int nr) > { > - if ( !test_and_set_bit(nr, &softirq_pending(cpu)) > - && (cpu != smp_processor_id()) ) > + unsigned int this_cpu = smp_processor_id(); The name clashes with the this_cpu() macro (obviously OK for compiling but harder to read) and can be dropped since... > + > + if ( test_and_set_bit(nr, &softirq_pending(cpu)) > + || (cpu == this_cpu) ) > + return; > + > + if ( !per_cpu(batching, this_cpu) || in_irq() ) this_cpu(batching) is the preferred way of addressing this on Xen anyway - though I guess maybe it's _slightly_ less efficient to recalculate get_cpu_info() here than to indirect through the per_cpu_offsets[]? I haven't looked at the asm. > smp_send_event_check_cpu(cpu); > + else > + set_bit(nr, &per_cpu(batch_mask, this_cpu)); > +} > + > +void cpu_raise_softirq_batch_begin(void) > +{ > + ++per_cpu(batching, smp_processor_id()); This one should definitely be using this_cpu(). > +} > + > +void cpu_raise_softirq_batch_finish(void) > +{ > + unsigned int cpu, this_cpu = smp_processor_id(); > + cpumask_t *mask = &per_cpu(batch_mask, this_cpu); Again, this_cpu()? > + ASSERT(per_cpu(batching, this_cpu)); > + for_each_cpu ( cpu, mask ) > + if ( !softirq_pending(cpu) ) > + cpumask_clear_cpu(cpu, mask); > + smp_send_event_check_mask(mask); > + cpumask_clear(mask); > + --per_cpu(batching, this_cpu); > } One other thing that occurs to me is that we might do the batching in the caller (by gathering a mask of pcpus in the vlapic code) but that sounds messy. And it's not like softirq is so particularly latency-sensitive that we'd want to avoid this much overhead in the common case. So on second thoughts this way is better. :) Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.