|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements
>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
>
>> This patch set makes some first steps towards eliminating the old cpumask
>> accessors, replacing them by such that don't require the full NR_CPUS
>> bits to be allocated (which obviously can be pretty wasteful when
>> NR_CPUS is high, but the actual number is low or moderate).
>>
>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>> 02: eliminate cpumask accessors referencing NR_CPUS
>> 03: eliminate direct assignments of CPU masks
>> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
>> 05: allocate CPU sibling and core maps dynamically
>
> I'm not sure about this. We can save ~500 bytes per cpumask_t when
> NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t's do we typically
> have dynamically allocated all at once? Let's say we waste 2kB per VCPU and
> per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs -- we'd
> save ~4MB in that extreme case. But such a large system probably actually
> will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
> insignificant.
It's not only the memory savings, but the time savings in manipulating
less space.
> I suppose there is a second argument that it shrinks the containing
> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
> helps reduce our order!=0 allocations?
Yes - that's what made me start taking over these Linux bits. What I
sent here just continues on that route. I was really hoping that we
wouldn't leave this in a half baked state.
> By the way, I think we could avoid the NR_CPUS copying overhead everywhere
> by having the cpumask.h functions respect nr_cpu_ids, but continuing to
> return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)? This
> would not need to change tonnes of code. It only gets part of the benefit
> (reducing cpu time overhead) but is more palatable?
That would be possible, but would again leave is in a somewhat
incomplete state. (Note that I did leave NR_CPUS in the stop-
machine logic).
>> 06: allow efficient allocation of multiple CPU masks at once
>
> That is utterly hideous and for insignificant saving.
I was afraid you would say that, and I'm not fully convinced
either. But I wanted to give it a try to see how bad it is. The
more significant saving here really comes from not allocating
the CPU masks at all for unused irq_desc-s.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|