Xen project Mailing List

Re: [PATCH] x86: make "dom0_nodes=" work with credit2

Date: Wed, 13 Apr 2022 12:00:12 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PQMIrFcPmVtolBh282r4JMvHWMIA/wsKbfYGuxCseDA=; b=L4WbSYuoTM4mjabzeH6P8yddNItiZIY5XkT40WYTxN+lW7RgaHoKMvGn5t9zYUg+wXSs2jHNU2ZUufO8lKG85WhZWjNIfTLVqRpXpiqw+dtVIu8idE1lR9Cbp5uHrYOVsbGdS5mWSOLTJ0Om86PXC9BpJBdFo+qPdLDgbEsLDt1c6qhRMaeH8ljGthZPunjpJp28BCeAWsHiktm6tcKPX6ohkAqsDcULamFRox1RPwWysSDeGVE8BAJvkEhyBMe76swUoWCd6K6eKTYdR1LmoGmVTNZ2ZB7Sxqhs+Sdbk57Ld89dDaN53xrkgMEs0AfjbI+B+88Z2Y7Y/0Li4pTDeg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=K3+8la/G5k8tnqLg3GDbwB0AdHlzGr7wDYluMoG2l5wDsMGZaYKtEDLFOV4LdhSgZ/AuPjFbtEB0oBcGhJ2qixf7PlsYEoanu6kBGkXzOFTKgskWglF/rA5Cni5J7r89AqAVDquKkpQPQDJsIC+66WXmaU5Chp47VArCYvNCadmc3b0ieq6bF3d5Frz6lPDesrT+SZ2ZY/Bt28HbynQwC2OqvlLgXsc7wyJxwGSaHMUBndj0WMfeoRjVi0m/2bnszCUf9RtNIFh/0zybzERw7//iw251Y07TlYICT9mLkNPykpTr8HdsNkUSoryQOInC8oBYzvJmWej4iQc+HYsSgQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 13 Apr 2022 10:00:27 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 12.04.2022 18:11, Dario Faggioli wrote: > --- a/xen/common/sched/core.c > +++ b/xen/common/sched/core.c > @@ -572,11 +572,41 @@ int sched_init_vcpu(struct vcpu *v) > } > > /* > - * Initialize affinity settings. The idler, and potentially > - * domain-0 VCPUs, are pinned onto their respective physical CPUs. > + * Initialize affinity settings. By doing this before the unit is > + * inserted in the scheduler runqueues (by the call to > sched_insert_unit(), > + * at the end of the function, we are sure that it will be put on an > + * appropriate CPU. > */ > - if ( is_idle_domain(d) || (is_hardware_domain(d) && opt_dom0_vcpus_pin) ) > + if ( pv_shim && v->vcpu_id == 0 ) I don't think you can handle the shim case first, as then you'd also have its CPU0 idle vCPU take this path. The difference _may_ only be cosmetic, but I think it would be odd for CPU0's idle vCPU to have a soft affinity of just CPU0, while all others use cpumask_all. > + { > + /* > + * PV-shim: vcpus are pinned 1:1. Initially only 1 cpu is online, > + * others will be dealt with when onlining them. This avoids pinning > + * a vcpu to a not yet online cpu here. > + */ > + sched_set_affinity(unit, cpumask_of(0), cpumask_of(0)); > + } > + else if ( is_idle_domain(d) || (is_hardware_domain(d) && > opt_dom0_vcpus_pin) ) Here (pre-existing) as well as ... > + { > + /* > + * The idler, and potentially domain-0 VCPUs, are pinned onto their > + * respective physical CPUs. > + */ > sched_set_affinity(unit, cpumask_of(processor), &cpumask_all); > + } > + else if ( is_hardware_domain(d) ) ... here I wonder: Shouldn't this be limited to Dom0 (for the purposes here != hwdom)? Any special affinity for a late hwdom ought to be specified by the logic creating that domain imo, not by command line options concerning Dom0 only. This then also determines where the involved variables (opt_dom0_vcpus_pin, dom0_affinity_relaxed, and dom0_cpus) are to be placed (answering your question in a subsequent reply): If it's strictly Dom0 only (and knowing there's no way to add vCPU-s to a domain post-construction), then __initdata is fine for all of them. If late hwdom was to also be covered, __hwdom_initdata would need to be used. > + { > + /* > + * In absence of dom0_vcpus_pin, the hard and soft affinity of > + * domain-0 is controlled by the dom0_nodes parameter. At this point > + * it has been parsed and decoded, and we have the result of that > + * in the dom0_cpus mask. > + */ > + if ( !dom0_affinity_relaxed ) > + sched_set_affinity(unit, &dom0_cpus, &cpumask_all); > + else > + sched_set_affinity(unit, &cpumask_all, &dom0_cpus); I guess by referencing dom0_affinity_relaxed and dom0_cpus outside of CONFIG_X86 section you're breaking the Arm build. I also have a more general question here: sched.h says "Bitmask of CPUs on which this VCPU may run" for hard affinity and "Bitmask of CPUs on which this VCPU prefers to run" for soft affinity. Additionally there's soft_aff_effective. Does it make sense in the first place for one to be a proper subset of the of the other in _both_ directions? Is that mainly to have a way to record preferences even when all preferred CPUs are offline, to be able to go back to the preferences once CPUs come back online? Then a follow-on question is: Why do you use cpumask_all for soft affinity in the first of the two calls above? Is this to cover for the case where all CPUs in dom0_cpus would go offline? > + } > else > sched_set_affinity(unit, &cpumask_all, &cpumask_all); Hmm, you leave this alone. Wouldn't it be better to further generalize things, in case domain affinity was set already? I was referring to the mask calculated by sched_select_initial_cpu() also in this regard. And when I did suggest to re-use the result, I did mean this literally. > @@ -3386,29 +3416,18 @@ void wait(void) > void __init sched_setup_dom0_vcpus(struct domain *d) > { > unsigned int i; > - struct sched_unit *unit; > > for ( i = 1; i < d->max_vcpus; i++ ) > vcpu_create(d, i); > > /* > - * PV-shim: vcpus are pinned 1:1. > - * Initially only 1 cpu is online, others will be dealt with when > - * onlining them. This avoids pinning a vcpu to a not yet online cpu > here. > + * sched_vcpu_init(), called by vcpu_create(), will setup the hard and > + * soft affinity of all the vCPUs, by calling sched_set_affinity() on > each > + * one of them. We can now make sure that the domain's node affinity is > + * also updated accordingly, and we can do that here, once and for all > + * (which is more efficient than calling domain_update_node_affinity() > + * on all the vCPUs). > */ > - if ( pv_shim ) > - sched_set_affinity(d->vcpu[0]->sched_unit, > - cpumask_of(0), cpumask_of(0)); > - else > - { > - for_each_sched_unit ( d, unit ) > - { > - if ( !opt_dom0_vcpus_pin && !dom0_affinity_relaxed ) > - sched_set_affinity(unit, &dom0_cpus, NULL); > - sched_set_affinity(unit, NULL, &dom0_cpus); > - } > - } > - > domain_update_node_affinity(d); > } I consider the comment somewhat misleading, and hence I wonder if a comment is needed here in the first place. domain_update_node_affinity() acts on a domain, not on individual vCPU-s. Hence it's not clear what "calling domain_update_node_affinity() on all the vCPUs" would be referring to. I don't think anyone would consider calling vcpu_set_affinity() here just for the purpose of updating domain affinity, with all other (vCPU) affinity setting now happening elsewhere. > --- a/xen/common/sched/credit2.c > +++ b/xen/common/sched/credit2.c > @@ -749,10 +749,12 @@ static int get_fallback_cpu(struct csched2_unit *svc) > > /* > * This is cases 2 or 4 (depending on bs): v->processor isn't there > - * any longer, check if we at least can stay in our current runq. > + * any longer, check if we at least can stay in our current runq, > + * if we have any (e.g., we don't yet, if we get here when a unit > + * is inserted for the very first time). > */ > - if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu), > - &svc->rqd->active)) ) > + if ( likely(svc->rqd && cpumask_intersects(cpumask_scratch_cpu(cpu), > + &svc->rqd->active)) ) > { > cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), > &svc->rqd->active); This change is not covered by anything in the description, and I wonder why you're making the code adjustment. If svc->rqd was NULL, wouldn't Xen have crashed prior to the adjustment? I can't spot how it being NULL here could be the effect of any of the other changes you're making. If the comment adjustment is to be retained, please take care of the hard tabs which have slipped in. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.