[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
> Hi!
> First of all, my filters somehow failed to highlight this for me, so
> sorry if I did not notice it earlier (and now, I need new filters
> anyway, as the email I'm using is different :-D).
> I'll have a look at the patch ASAP.
> On Mon, 2018-01-08 at 11:12 +0000, George Dunlap wrote:
> > On 01/08/2018 10:37 AM, Jan Beulich wrote:
> >
> > > I don't understand: Isn't the null scheduler not moving around
> > > vCPU-s at all? At least that's what the comment at the top of the
> > > file says, unless I'm mis-interpreting it. If so, how can "some CPU
> > > (...) pick this vCPU"?
> > 
> > There's no current way to prevent a user from adding more vcpus to a
> > pool than there are pcpus (if nothing else, by creating a new VM in a
> > given pool), or from taking pcpus from a pool in which #vcpus >=
> > #pcpus.
> > 
> Exactly. And something that checks for that is all but easy to
> introduce (let's just avoid even mentioning enforcing!).
> > The null scheduler deals with this by having a queue of "unassigned"
> > vcpus that are waiting for a free pcpu.  When a pcpu becomes
> > available,
> > it will do the assignment.  When a pcpu that has a vcpu is assigned
> > is
> > removed from the pool, that vcpu is assigned to a different pcpu if
> > one
> > is available; if not, it is put on the list.
> > 
> Err... yes. BTW, either there are a couple of typos in the above
> paragraph, or it's me that can't read it well. Anyway, just to be
> clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might be
> the situation:
> CPU0 <-- d1v0
> CPU1 <-- d2v0
> CPU2 <-- d3v0
> CPU3 <-- d4v0
> Waitqueue: d5v0,d6v0
> Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up from
> the waitqueue and assigned to CPU1.

I think the above example is not representative of what happens inside
of the shim, since there's only one domain that runs on the shim, so
the picture is something like:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Then if the guest brings up another vCPU, let's assume it's vCPU#3
pCPU#3 will be bring up form the shim PoV, and the null scheduler will
assign the first vCPU on the waitqueue:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU3 <-- d1v2 (down)
NULL <-- d1v3 (up)

Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
which is up won't get assigned to any pCPU, and hence won't run.

> > In the case of shim mode, this also seems to happen whenever curvcpus
> > <
> > maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which
> > to
> > schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule,
> > of
> > which (maxvcpus-curvcpus) are  marked 'down'.  
> >
> Mmm, wait. In case of a domain which specifies both maxvcpus and
> curvcpus, how many vCPUs does the domain in which the shim run?

Regardless of the values of maxvcpus and curvcpus PV guests are always
started with only the BSP online, and then the guest itself brings up
other vCPUs.

In the shim case vCPU hotplug is tied to pCPU hotplug, so everytime
the guest hotplugs or unplugs a vCPU the shim does exactly the same
with it's CPUs.

> > In this case, it also
> > seems that the null scheduler sometimes schedules a "down" vcpu when
> > there are "up" vcpus on the list; meaning that the "up" vcpus are
> > never
> > scheduled.
> > 
> I'm not sure how an offline vCPU can end up there... but maybe I need
> to look at the code better, with the shim use case in mind.
> Anyway, I'm fine with checks that prevent offline vCPUs to be assigned
> to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
> CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
> wakeup.

So using the scenario from before:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
still not up, hence we get the following:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Then d1v2 is brought up, but since the null scheduler doesn't react to
wakeup the picture stays the same:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (up), d1v3 (down)

And d1v2 doesn't get scheduled.

Hope this makes sense :)

Thanks, Roger.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.