[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
On August 18, 2015 8:55:32 AM PDT, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: >Hey everyone, > >So, as a followup of what we were discussing in this thread: > > [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest >http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg03241.html > >I started looking in more details at scheduling domains in the Linux >kernel. Now, that thread was about CPUID and vNUMA, and their weird way >of interacting, while this thing I'm proposing here is completely >independent from them both. > >In fact, no matter whether vNUMA is supported and enabled, and no >matter >whether CPUID is reporting accurate, random, meaningful or completely >misleading information, I think that we should do something about how >scheduling domains are build. > >Fact is, unless we use 1:1, and immutable (across all the guest >lifetime) pinning, scheduling domains should not be constructed, in >Linux, by looking at *any* topology information, because that just does >not make any sense, when vcpus move around. > >Let me state this again (hoping to make myself as clear as possible): >no >matter in how much good shape we put CPUID support, no matter how >beautifully and consistently that will interact with both vNUMA, >licensing requirements and whatever else. It will be always possible >for >vCPU #0 and vCPU #3 to be scheduled on two SMT threads at time t1, and >on two different NUMA nodes at time t2. Hence, the Linux scheduler >should really not skew his load balancing logic toward any of those two >situations, as neither of them could be considered correct (since >nothing is!). What about Windows guests? > >For now, this only covers the PV case. HVM case shouldn't be any >different, but I haven't looked at how to make the same thing happen in >there as well. > >OVERALL DESCRIPTION >=================== >What this RFC patch does is, in the Xen PV case, configure scheduling >domains in such a way that there is only one of them, spanning all the >pCPUs of the guest. Wow. That is an pretty simple patch!! > >Note that the patch deals directly with scheduling domains, and there >is >no need to alter the masks that will then be used for building and >reporting the topology (via CPUID, /proc/cpuinfo, /sysfs, etc.). That >is >the main difference between it and the patch proposed by Juergen here: >http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05088.html > >This means that when, in future, we will fix CPUID handling and make it >comply with whatever logic or requirements we want, that won't have >any >unexpected side effects on scheduling domains. > >Information about how the scheduling domains are being constructed >during boot are available in `dmesg', if the kernel is booted with the >'sched_debug' parameter. It is also possible to look >at /proc/sys/kernel/sched_domain/cpu*, and at /proc/schedstat. > >With the patch applied, only one scheduling domain is created, called >the 'VCPU' domain, spanning all the guest's (or Dom0's) vCPUs. You can >tell that from the fact that every cpu* folder >in /proc/sys/kernel/sched_domain/ only have one subdirectory >('domain0'), with all the tweaks and the tunables for our scheduling >domain. > ... > >REQUEST FOR COMMENTS >==================== >Basically, the kind of feedback I'd be really glad to hear is: > - what you guys thing of the approach, > - whether you think, looking at this preliminary set of numbers, that > this is something worth continuing investigating, > - if yes, what other workloads and benchmark it would make sense to > throw at it. > The thing that I was worried about is that we would be modifying the generic code, but your changes are all in Xen code! Woot! In terms of workloads, I am CCing Herbert who I hope can provide advise on this. Herbert, the full email is here: http://lists.xen.org/archives/html/xen-devel/2015-08/msg01691.html >Thanks and Regards, >Dario _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |