Xen project Mailing List

Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

To: "Dario Faggioli" <dario.faggioli@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 27 Feb 2015 10:50:36 +0000

Cc: "Keir \(Xen.org\)" <keir@xxxxxxx>, IanCampbell <Ian.Campbell@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "Tim \(Xen.org\)" <tim@xxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 27 Feb 2015 10:50:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 27.02.15 at 11:04, <dario.faggioli@xxxxxxxxxx> wrote: > On Fri, 2015-02-27 at 08:46 +0000, Jan Beulich wrote: >> >>> On 26.02.15 at 18:14, <dario.faggioli@xxxxxxxxxx> wrote: >> > On Thu, 2015-02-26 at 13:52 +0000, Jan Beulich wrote: >> >> +### dom0\_nodes >> >> + >> >> +> `= <integer>[,...]` >> >> + >> >> +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created >> >> +and memory assigned to Dom0 will be adjusted to match the node >> >> +restrictions set up here. Note that the values to be specified here are >> >> +ACPI PXM ones, not Xen internal node numbers. >> >> + >> > Why use PXM ids? It might be me being much more used to work with NUMA >> > node ids, but wouldn't the other way round be more consistent (almost >> > everything the user interacts with after boot speak node ids) and easier >> > for the user to figure things out (e.g., with tools like numactl on >> > baremetal)? >> >> This way behavior doesn't change if internally in the hypervisor we >> need to change the mapping from PXMs to node IDs. >> > Ok, I see the value of this. I'm still a bit concerned about the fact > that everything else "speak" NUMA node, but it's probably just me being > much more used to that than to PXMs. :-) With "everything else" I suppose you mean the tool stack? There shouldn't be any node IDs kept across reboots there. Yet the consistent behavior to be achieved here is particularly for multiple boots. >> >> +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int >> >> vcpu_id, >> >> + unsigned int cpu) >> >> +{ >> >> + struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu); >> >> + >> >> + if ( v ) >> >> + { >> >> + if ( !d->is_pinned ) >> >> + cpumask_copy(v->cpu_hard_affinity, &dom0_cpus); >> >> + cpumask_copy(v->cpu_soft_affinity, &dom0_cpus); >> >> + } >> >> + >> > About this, for DomUs, now that we have soft affinity available, what we >> > do is set only soft affinity to match the NUMA placement. I think I see >> > and agree why we want to be 'more strict' in Dom0, but I felt like it >> > was worth to point out the difference in behaviour (should it be >> > documented somewhere?). >> >> I'm simply adjusting what sched_init_vcpu() did, which is alter >> hard affinity conditionally on is_pinned and soft affinity >> unconditionally. >> > Ok, I understand the idea behing this better now, thanks. > [...] > Setting soft affinity as a superset of (in the former case) or equal to > (in the latter) hard affinity is just pure overhead, when in the > scheduler. The why does sched_init_vcpu() do what it does? If you want to alter that, I'm fine with altering it here. > In fact, if the scheduler sees that soft affinity is defined, it will go > through the load balancing/vcpu placement logic twice, the first time > using the soft affinity mask, the second using the hard affinity one. > Actually, the first time it uses 'soft & hard', which in these cases is > exactly equal to hard, and that's why I'm calling this pure overhead. > > I probably should add checks in the scheduler to identify such > situations as "no need to consider soft affinity". I thought about this > before, but didn't do that because it's a more cpumask_foo() fiddling in > a few hot paths... but of course I can check for the relationship > between hard and soft affinity masks upfront, cache the result in a > bool_t, and use _that_ in hot paths... what do you think? Avoiding the fiddling in hot paths is surely desirable. But it would indeed seem even better to avoid the inefficiency in the first place (i.e. when storing affinities). > All this being said, I still would avoid putting the system in a > configuration where soft is superset or equal to hard, at the very least > not automatically, as I think it can appear confusing to the user (the > user himself can, of course, do that after boot, for Dom0 or DomUs, but > that's another story, I think). So I'm now thinking whether it wouldn't > be better to, in this patch, leave soft affinity alone completely. > > Then, if we want to make it possible to tweak soft affinity, we can > allow for something like "dom0_nodes=soft:1,3" and, in that case, alter > soft affinity only. Hmm, not sure. And I keep being confused whether soft means "allow" and hard means "prefer" or the other way around. In any event, again, with sched_init_vcpu() setting up things so that soft is a superset of hard (and most likely they're equal), I don't see why the same done here would be more of a problem. >> > BTW, mostly out of curiosity, I've had a few strange issues/conflicts in >> > applying this on top of staging, in order to test it... Was it me doing >> > something very stupid, or was this based on something different? >> >> Apart from the one patch named in the cover letter there shouldn't >> be any other dependencies. Without you naming the issues you >> encountered, I can't tell. >> > I see. Never mind then, maybe I messed up with my various branches... > Sorry for bothering with this. :-) No reason to be sorry - I'm more than happy if inconsistencies get pointed out before trying to commit anything. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.