[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Dario Faggioli <dfaggioli@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 29 Apr 2022 14:16:00 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IJ5RuXG8WItrBhM/uQ9rlzRMbGZNqSxGBHKHzUt6TOw=; b=LyXfBwvDnh9y+1IWhVdgLLVZChZfWKG7G8bMm7J+oyjpRatqPSkaR0H0/CpPhfKct0Ge1NgZz2g5YiiSPmTJCsv6hW3wqMh3OP9LaAE2R6G5L3xzupSKS+l/3zChcdHWG66UCvzGu0xLFzBLDle0/ZkMj6nJU1983VsS1avJIRL3MJbuLZNqTcqtud+9UIV9Xhf7/kT9KvmF7M/xl2n8CxqoBMZnIom/jSPLOW+U9iWPC0QQMry6Q4/an7sie+5MNdDDFu07t5xeSR3ADh9Lo0JakO0DCFkL0KwIW4irJXezm0trIhFIF0P7AxI+o8s4R7ny1VbtfZSFvRDOzU25Jg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MiC23CgbcKEYZUYEfV2t6QwMtD4H9XYQ5Y2QxVukO5GzT+jOUSIHYWWcZbThxyk7z8SETAaRPazL+IZnJ1oVYUxsn8iiQQinhCo0QMlrJTaPY3P35efe3Y6yhaCDmE2HaDQOLLy82K70OS+MuEvIP0tsyaF5/O8afCl2CGSlvPURb1x8LvetMOwPlw9k+abB0nrzSJISR4hszJVCAHDqYxTGIzWRXoLQE697thZuFyvSnqOvIyI0qrCM6ia7fGlArehfb36GVz9pC8yHHHA7gMiEzroulurNgjzh9NBRaR3mWytajU7y25VH709nTMWXmLBM/s1P6nxd8s8qfVwWxA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 29 Apr 2022 12:16:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29.04.2022 12:52, Dario Faggioli wrote:
> On Wed, 2022-04-13 at 12:00 +0200, Jan Beulich wrote:
>> I also have a more general question here: sched.h says "Bitmask of
>> CPUs
>> on which this VCPU may run" for hard affinity and "Bitmask of CPUs on
>> which this VCPU prefers to run" for soft affinity. Additionally
>> there's
>> soft_aff_effective. Does it make sense in the first place for one to
>> be
>> a proper subset of the of the other in _both_ directions? 
>>
> I'm not sure I'm 100% getting what you're asking. In particular, I'm
> not sure what you mean with "for one to be a propper subset of the
> other in both directions"?
> 
> Anyway, soft and hard affinity are under the complete control of the
> user (I guess we can say that they're policy), so we tend to accept
> pretty much everything that comes from the user.
> 
> That is, the user can set an hard affinity to 1-6 and a soft affinity
> of (a) 2-3, (b) 0-2, (c) 7-12, etc.
> 
> Case (a), i.e., soft is a strict subset of hard, is the one that makes
> the most sense, of course. With this configuration, the vCPU(s) can run
> on CPUs 1, 2, 3, 4, 5 and 6, but the scheduler will prefer to run it
> (them) on 2 and/or 3.
> 
> Case (b), i.e., no strict subset, but there's some overlap, also means
> that soft-affinity is going to be considered and have an effect. In
> fact, vCPU(s) will prefer to run on CPUs 1 and/or 2, but of course it
> (they) will never run on CPU 0. Of course, the user can, at a later
> point in time, change the hard affinity so that it includes CPU 0, and
> we'll be back to the strict-subset case. So that's way we want to keep
> 0 in the mast, even if it causes soft to not be a strict subset of
> hard.
> 
> In case (c), soft affinity is totally useless. However, again, the user
> can later change hard to include some or all CPUs 7-12, so we keep it.
> We do, however, print a warning. And we also use the soft_aff_effective
> flag to avoid going through the soft-affinity balancing step in the
> scheduler code. This is, in fact, why we also check whether hard is not
> a strict subset of soft. As, if it is, there's no need to do anything
> about soft, as honoring hard will automatically take care of that as
> well.
> 
>> Is that mainly
>> to have a way to record preferences even when all preferred CPUs are
>> offline, to be able to go back to the preferences once CPUs come back
>> online?
>>
> That's another example/use case, yes. We want to record the user's
> preference, whatever the status of the system (and of other aspects of
> the configuration) is.
> 
> But I'm not really sure I've answered... Have I?

You did. My question really only was whether there are useful scenarios
for proper-subset cases in both possible directions.

>> Then a follow-on question is: Why do you use cpumask_all for soft
>> affinity in the first of the two calls above? Is this to cover for
>> the
>> case where all CPUs in dom0_cpus would go offline?
>>
> Mmm... what else should I be using?

I was thinking of dom0_cpus.

> If dom0_nodes is in "strict" mode,
> we want to control hard affinity only. So we set soft to the default,
> which is "all". During operations, since hard is a subset of "all",
> soft-affinity will be just ignored.

Right - until such point that all (original) Dom0 CPUs have gone
offline. Hence my 2nd question.

> So I'm using "all" because soft-affinity is just "all", unless someone
> sets it differently.

How would "someone set it differently"? Aiui you can't control both
affinities at the same time.

> But I am again not sure that I fully understood and properly addressed
> your question. :-(
> 
> 
>>> +    }
>>>      else
>>>          sched_set_affinity(unit, &cpumask_all, &cpumask_all);
>>
>> Hmm, you leave this alone. Wouldn't it be better to further
>> generalize
>> things, in case domain affinity was set already? I was referring to
>> the mask calculated by sched_select_initial_cpu() also in this
>> regard.
>> And when I did suggest to re-use the result, I did mean this
>> literally.
>>
> Technically, I think we can do that. Although, it's probably cumbersome
> to do, without adding at least one cpumask on the stack, or reshuffle
> the locking between sched_select_initial_cpu() and sched_init_vcpu(),
> in a way that I (personally) don't find particularly pretty.

Locking? sched_select_initial_cpu() calculates into a per-CPU variable,
which I sincerely hope cannot be corrupted by another CPU.

> Also, I don't think we gain much from doing that, as we probably still
> need to have some special casing of dom0, for handling dom0_vcpus_pin.

dom0_vcpus_pin is likely always going to require special casing, until
such point where we drop support for it.

> And again, soft and hard affinity should be set to what the user wants
> and asks for. And if, for instance, he/she passes
> dom0_nodes="1,strict", soft-affinity should just be all. If, e.g., we
> set both hard and soft affinity to the CPUs of node 1, and if later
> hard affinity is manually changed to "all", soft affinity will remain
> to node 1, even if it was never asked for it to be that way, and the
> user will need to change that explicitly as well. (Of course, it's not
> particularly clever to boot with dom0_nodes="1,strict" and then change
> dom0's vCPUs' hard affinity to node 0... but the user is free to do
> that.)

I can certainly accept this as justification for using "all" further up.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.