[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH, RFC] x86/HVM: batch vCPU wakeups



>>> On 09.09.14 at 23:29, <tim@xxxxxxx> wrote:
> At 09:33 +0100 on 09 Sep (1410251617), Jan Beulich wrote:
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -447,12 +447,30 @@ void vlapic_ipi(
>>  
>>      default: {
>>          struct vcpu *v;
>> -        for_each_vcpu ( vlapic_domain(vlapic), v )
>> +        const struct domain *d = vlapic_domain(vlapic);
>> +        bool_t batch = d->max_vcpus > 2 &&
>> +                       (short_hand
>> +                        ? short_hand != APIC_DEST_SELF
>> +                        : vlapic_x2apic_mode(vlapic)
>> +                          ? dest_mode ? hweight16(dest) > 1
>> +                                      : dest == 0xffffffff
>> +                          : dest_mode
>> +                            ? hweight8(dest &
>> +                                       GET_xAPIC_DEST_FIELD(
>> +                                           vlapic_get_reg(vlapic,
>> +                                                          APIC_DFR))) > 1
>> +                            : dest == 0xff);
> 
> Yoiks.  Could you extract some of this into a helper function, say
> is_unicast_dest(), just for readability?

Sure.

>>  void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
>>  {
>> -    if ( !test_and_set_bit(nr, &softirq_pending(cpu))
>> -         && (cpu != smp_processor_id()) )
>> +    unsigned int this_cpu = smp_processor_id();
> 
> The name clashes with the this_cpu() macro (obviously OK for compiling
> but harder to read) and can be dropped since...
> 
>> +
>> +    if ( test_and_set_bit(nr, &softirq_pending(cpu))
>> +         || (cpu == this_cpu) )
>> +        return;
>> +
>> +    if ( !per_cpu(batching, this_cpu) || in_irq() )
> 
> this_cpu(batching) is the preferred way of addressing this on Xen
> anyway - though I guess maybe it's _slightly_ less efficient to
> recalculate get_cpu_info() here than to indirect through the
> per_cpu_offsets[]?  I haven't looked at the asm.

Right - multiple uses of this_cpu() in the same function are indeed
less efficient than latching smp_processor_id() into a local variable.

>>          smp_send_event_check_cpu(cpu);
>> +    else
>> +        set_bit(nr, &per_cpu(batch_mask, this_cpu));
>> +}
>> +
>> +void cpu_raise_softirq_batch_begin(void)
>> +{
>> +    ++per_cpu(batching, smp_processor_id());
> 
> This one should definitely be using this_cpu().

Absolutely - unintended copy-and-paste effect.

>> +}
>> +
>> +void cpu_raise_softirq_batch_finish(void)
>> +{
>> +    unsigned int cpu, this_cpu = smp_processor_id();
>> +    cpumask_t *mask = &per_cpu(batch_mask, this_cpu);
> 
> Again, this_cpu()?

No, as again the latched local variable yields better code.

>> +    ASSERT(per_cpu(batching, this_cpu));
>> +    for_each_cpu ( cpu, mask )
>> +        if ( !softirq_pending(cpu) )
>> +            cpumask_clear_cpu(cpu, mask);
>> +    smp_send_event_check_mask(mask);
>> +    cpumask_clear(mask);
>> +    --per_cpu(batching, this_cpu);
>>  }
> 
> One other thing that occurs to me is that we might do the batching in
> the caller (by gathering a mask of pcpus in the vlapic code) but that
> sounds messy.  And it's not like softirq is so particularly
> latency-sensitive that we'd want to avoid this much overhead in the
> common case.  So on second thoughts this way is better. :)

In fact I had it that way first, and it looked ugly (apart from making
it more cumbersome for potential other use sites to switch to this
batching model).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.