Xen project Mailing List

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Fri, 12 Jan 2018 10:45:49 +0000

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, wei.liu2@xxxxxxxxxx, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Fri, 12 Jan 2018 10:46:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote: > Hi! > > First of all, my filters somehow failed to highlight this for me, so > sorry if I did not notice it earlier (and now, I need new filters > anyway, as the email I'm using is different :-D). > > I'll have a look at the patch ASAP. > > On Mon, 2018-01-08 at 11:12 +0000, George Dunlap wrote: > > On 01/08/2018 10:37 AM, Jan Beulich wrote: > > > > > I don't understand: Isn't the null scheduler not moving around > > > vCPU-s at all? At least that's what the comment at the top of the > > > file says, unless I'm mis-interpreting it. If so, how can "some CPU > > > (...) pick this vCPU"? > > > > There's no current way to prevent a user from adding more vcpus to a > > pool than there are pcpus (if nothing else, by creating a new VM in a > > given pool), or from taking pcpus from a pool in which #vcpus >= > > #pcpus. > > > Exactly. And something that checks for that is all but easy to > introduce (let's just avoid even mentioning enforcing!). > > > The null scheduler deals with this by having a queue of "unassigned" > > vcpus that are waiting for a free pcpu. When a pcpu becomes > > available, > > it will do the assignment. When a pcpu that has a vcpu is assigned > > is > > removed from the pool, that vcpu is assigned to a different pcpu if > > one > > is available; if not, it is put on the list. > > > Err... yes. BTW, either there are a couple of typos in the above > paragraph, or it's me that can't read it well. Anyway, just to be > clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might be > the situation: > > CPU0 <-- d1v0 > CPU1 <-- d2v0 > CPU2 <-- d3v0 > CPU3 <-- d4v0 > > Waitqueue: d5v0,d6v0 > > Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up from > the waitqueue and assigned to CPU1. I think the above example is not representative of what happens inside of the shim, since there's only one domain that runs on the shim, so the picture is something like: CPU0 <-- d1v0 CPU1 <-- d1v1 waitqueue: d1v2 (down), d1v3 (down) Then if the guest brings up another vCPU, let's assume it's vCPU#3 pCPU#3 will be bring up form the shim PoV, and the null scheduler will assign the first vCPU on the waitqueue: CPU0 <-- d1v0 CPU1 <-- d1v1 CPU3 <-- d1v2 (down) NULL <-- d1v3 (up) Hence d1v2 which is still down will get assigned to CPU#3, and d1v3 which is up won't get assigned to any pCPU, and hence won't run. > > In the case of shim mode, this also seems to happen whenever curvcpus > > < > > maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which > > to > > schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule, > > of > > which (maxvcpus-curvcpus) are marked 'down'. > > > Mmm, wait. In case of a domain which specifies both maxvcpus and > curvcpus, how many vCPUs does the domain in which the shim run? Regardless of the values of maxvcpus and curvcpus PV guests are always started with only the BSP online, and then the guest itself brings up other vCPUs. In the shim case vCPU hotplug is tied to pCPU hotplug, so everytime the guest hotplugs or unplugs a vCPU the shim does exactly the same with it's CPUs. > > In this case, it also > > seems that the null scheduler sometimes schedules a "down" vcpu when > > there are "up" vcpus on the list; meaning that the "up" vcpus are > > never > > scheduled. > > > I'm not sure how an offline vCPU can end up there... but maybe I need > to look at the code better, with the shim use case in mind. > > Anyway, I'm fine with checks that prevent offline vCPUs to be assigned > to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the > CPUs of L1 Xen). I'm less fine with rescheduling everyone at every > wakeup. So using the scenario from before: CPU0 <-- d1v0 CPU1 <-- d1v1 waitqueue: d1v2 (down), d1v3 (down) Guest decided to hotplug vCPU#2, and hence the shim first hotplugs CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is still not up, hence we get the following: CPU0 <-- d1v0 CPU1 <-- d1v1 CPU2 <-- NULL waitqueue: d1v2 (down), d1v3 (down) Then d1v2 is brought up, but since the null scheduler doesn't react to wakeup the picture stays the same: CPU0 <-- d1v0 CPU1 <-- d1v1 CPU2 <-- NULL waitqueue: d1v2 (up), d1v3 (down) And d1v2 doesn't get scheduled. Hope this makes sense :) Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.