[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: make "dom0_nodes=" work with credit2


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Fri, 29 Apr 2022 14:09:50 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zZ2JvjYipBvzOZWaySmotetEPlbA3iJbOqtDc/S9194=; b=lvWCF7Cn8L/Ok6fHM1NcAKpYuDCTjthDs75HKe1RyQ5IfkDtCSvB60wzF86lUJnizYa2sg321Hsuy3OFgeQk2OAH91hubkg/YmrcVABVgxByI7TjjXpD23b7qQ3t8Uq0aRcc2K8yz3pR1ltI2uLBvM9hKkvz4OwnzdSoHkKWy/En8TSK1QRNqPJ+Ij5I0IjY5BxP5i03U+dlOct6GWVU0nyi+AldcjYOhXF+PSbhs+ufG5THEyr7J6r/2WKGoOqOLQi4WQgB540F4sD525HDpV7gnZzAifPV9Y8G+TO9Diq9a/I6SHCv39SqeRvSjllhHRZ9k5EQkXopU2q3qfHPeQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Qgtqz/eBEW2a0IuHTkfaUJ+PeFp5qFhvrN4bGRlD0+7c+QHQVyxdUgJAWgw05JskgGGTtg+PuezDDNgPwem233L3ZfgR+PZmaPhr8YDNPi0+COEOWvz+KgnaXuTuN7YUSdp+k4BjMTWmIkuDKzS8LIvlsY9hYnTY0NOVNNPx3dfLm77TBQevlFiMiGanb7wrOnpG2osjMHIUU1kaKTqDH9I5AifmwqHxIOZ0evuUrM5qxO7Dvl+k7d9y62c2UPuKh0lNW6UhkOK7CpdQAOM9J9aaxmiPOfmg7jIIPLYLrQBKhsgSRZ+wuIfea1m9pUNJySCRZG2+g0BViKdpczcqBg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 29 Apr 2022 14:10:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYSoM0M4Zi9SzqTkOjTOtQV+PP7azl4C2AgAAVUwCABn7lAIAABpgAgAEqhACAGTPWgIAAF2gAgAAfzYA=
  • Thread-topic: [PATCH] x86: make "dom0_nodes=" work with credit2

On Fri, 2022-04-29 at 14:16 +0200, Jan Beulich wrote:
> On 29.04.2022 12:52, Dario Faggioli wrote:
> > > Is that mainly
> > > to have a way to record preferences even when all preferred CPUs
> > > are
> > > offline, to be able to go back to the preferences once CPUs come
> > > back
> > > online?
> > > 
> > That's another example/use case, yes. We want to record the user's
> > preference, whatever the status of the system (and of other aspects
> > of
> > the configuration) is.
> > 
> > But I'm not really sure I've answered... Have I?
> 
> You did. 
>
Ok, great! :-)

> > 
> > If dom0_nodes is in "strict" mode,
> > we want to control hard affinity only. So we set soft to the
> > default,
> > which is "all". During operations, since hard is a subset of "all",
> > soft-affinity will be just ignored.
> 
> Right - until such point that all (original) Dom0 CPUs have gone
> offline. Hence my 2nd question.
> 
> > So I'm using "all" because soft-affinity is just "all", unless
> > someone
> > sets it differently.
> 
> How would "someone set it differently"? Aiui you can't control both
> affinities at the same time.
> 
Yeah, the argument here is basically the one that I put below, and that
you say you understand. I guess I could have put it a bit more upfront,
sorry about that. :-)

> > > 
> > > Hmm, you leave this alone. Wouldn't it be better to further
> > > generalize
> > > things, in case domain affinity was set already? I was referring
> > > to
> > > the mask calculated by sched_select_initial_cpu() also in this
> > > regard.
> > > And when I did suggest to re-use the result, I did mean this
> > > literally.
> > > 
> > Technically, I think we can do that. Although, it's probably
> > cumbersome
> > to do, without adding at least one cpumask on the stack, or
> > reshuffle
> > the locking between sched_select_initial_cpu() and
> > sched_init_vcpu(),
> > in a way that I (personally) don't find particularly pretty.
> 
> Locking? sched_select_initial_cpu() calculates into a per-CPU
> variable,
> which I sincerely hope cannot be corrupted by another CPU.
> 
No, not by another CPU, hopefully.

And this is probably fine, during boot, when there should be no (other)
scheduling activity. However, during normal operation, a vCPU being
scheduled on CPU X, or in general having X in v->processor, could be
using the scratch cpumask of X already. So, if we use it without
locking, we'd risk using the wrong mask.

Therefore, we require the scheduler lock to be held, for playing with
the scratch cpumasks:

/*
 * Scratch space, for avoiding having too many cpumask_t on the stack.
 * Within each scheduler, when using the scratch mask of one pCPU:
 * - the pCPU must belong to the scheduler,
 * - the caller must own the per-pCPU scheduler lock (a.k.a. runqueue
 *   lock).
 */
DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
#define cpumask_scratch        (&this_cpu(cpumask_scratch))
#define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))

And sched_init_vcpu() (and hence sched_select_initial_cpu()) can be
called during normal operation.

In fact, sched_select_initial_cpu() does pcpu_schedule_lock_irqsave()
before starting using it.

> 
> > And again, soft and hard affinity should be set to what the user
> > wants
> > and asks for. And if, for instance, he/she passes
> > dom0_nodes="1,strict", soft-affinity should just be all. If, e.g.,
> > we
> > set both hard and soft affinity to the CPUs of node 1, and if later
> > hard affinity is manually changed to "all", soft affinity will
> > remain
> > to node 1, even if it was never asked for it to be that way, and
> > the
> > user will need to change that explicitly as well. (Of course, it's
> > not
> > particularly clever to boot with dom0_nodes="1,strict" and then
> > change
> > dom0's vCPUs' hard affinity to node 0... but the user is free to do
> > that.)
> 
> I can certainly accept this as justification for using "all" further
> up.
> 
Good then.

Do you think some of this exchange we had should end somewhere
(comments? changelogs?), to make it clearer to both future us and new
contributors why things are done this way?

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.