[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Notes on stubdoms and latency on ARM



On Wed, 2017-07-19 at 12:21 +0100, Julien Grall wrote:
> On 17/07/17 12:28, George Dunlap wrote:
> > Just checking -- you do mean its own core, as opposed to its own
> > socket?
> >  (Or NUMA node?)
> 
> I don't know much about the scheduler, so I might say something
> stupid 
> here :). Below the code we have for ARM
> 
> /* XXX these seem awfully x86ish... */
> /* representing HT siblings of each logical CPU */
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask);
> /* representing HT and core siblings of each logical CPU */
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
> 
> static void setup_cpu_sibling_map(int cpu)
> {
>      if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) ||
>           !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) )
>          panic("No memory for CPU sibling/core maps");
> 
>      /* A CPU is a sibling with itself and is always on its own core.
> */
>      cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
>      cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
> }
> 
> #define cpu_to_socket(_cpu) (0)
> 
> After calling setup_cpu_sibling_map, we never touch cpu_sibling_mask
> and 
> cpu_core_mask for a given pCPU. So I would say that each logical CPU
> is 
> in its own core, but they are all in the same socket at the moment.
> 
Ah, fine... so you're in the exact opposite situation I was thinking
about and reasoning upon in the reply to George I've just sent! :-P

Ok, this basically means that, by default, in any ARM system, no matter
how big or small, Credit2 will always use just one runqueue, from which
_all_ the pCPUs will fish vCPUs, for running them.

As said already, it's impossible to tell whether this is either bad or
good, in the general case. It's good for fairness and load distribution
(load balancing happens automatically, without the actual load
balancing logic and code having to do anything at all!), but it's bad
for lock contention (every runq operation, e.g., wakeup, schedule,
etc., have to take the same lock).

I think this explains at least part of why Stefano's wakeup latency
numbers are rather bad with Credit2, on ARM, but that is not the case
for my tests on x86.

> > All that to say: It shouldn't be a major issue if you are mis-
> > reporting
> > sockets. :-)
> 
> Good to know, thank you for the explanation! We might want to parse
> the 
> bindings correctly to get a bit of improvement. I will add a task on
> jira.
> 
Yes, we should. Credit1 does not care about, but Credit2 is
specifically designed to take advantage of these (and possibly even
more!) information, so they need to be accurate. :-D

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.