Xen project Mailing List

Re: [Xen-devel] planned csched improvements?

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Fri, 9 Oct 2009 16:59:25 +0100

Delivery-date: Fri, 09 Oct 2009 08:59:52 -0700

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=RNJ+YkRuatAKq0ip5OtZ17XnkYkw9nNIQtWIALfzsKYZRKkUAacKEp+zhOv/G/KJlf M6RhEKjk0N2qJr1VLD57bNk+wxw6wrO5ZsfpEtV932v0oOWpeTAhII8DBBp25SrDw20G V42qh6Q8LWbpKmcWN7Vw9p4wFCeu1lku2/H9s=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Fri, Oct 9, 2009 at 3:53 PM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote: > After the original announcement of plans to do some work on csched there > wasn't much activity, so I'd like to ask about some observations that I made > with the current implementation and whether it would be expected that > those planned changes would take care of them. There has been activity, but nothing worth sharing yet. :-) I'm working on the new "fairness" algorithm (perhaps called credits, perhaps not), which is a prerequisite for any further work such as load-balancing, power consumption, and so on. Unfortunately, I haven't been able to work on it for more than a week at a time for the last several months before being interrupted with other work-related tasks. :-( Re the items you bring up below: I believe that my planned changes to load-balancing should address the first. First, I plan on making all cores which share an L2 cache to share a runqueue. This will automatically share work among those cores without needing any special load-balancing to be done. Then, I plan on actually calculating: * The per-runqueue load over the last time period * The amount each vcpu is contributing to that load. Then load balancing won't be a matter of looking at the instantaneous runqueue lengths (as it is currently) but to the actual amount of "business" the runqueue has over a period of time. Load balancing will be just that: actually moving vcpus around to make the loads more balanced. Balancing operations will happen at fixed intervals, rather than "whenever a runqueue is idle". But those are just plans now; not a line of code has been written, and schedulers especially are notorious for the Law of Unexpected Consequences. Re soft-lockups: That really shouldn't be possible with the current scheduler; if it happens, it's a bug. Have you pulled from xen-unstable recently? There was a bug introduced a few weeks ago that would cause problems; Keir checked in a fix for that one last week. Otherwise, if you're sure it's not a long hypercall issue, there must be a bug somewhere. The new scheduler will be an almost complete re-write; so it will probably erase this bug, and introduce its own bugs. However, I doubt it will be ready by 3.5, so it's probably worth tracking down and fixing if we can. Hope that answers your question. :-) -George > On a lightly loaded many-core non-hyperthreaded system (e.g. a single > CPU bound process in one VM, and only some background load elsewhere), > I see this CPU bound vCPU permanently switch between sockets, which is > a result of csched_cpu_pick() eagerly moving vCPU-s to "more idle" > sockets. It would seem that some minimal latency consideration might be > useful to get added here, so that a very brief interruption by another > vCPU doesn't result in unnecessary migration. > > As a consequence of that eager moving, in the vast majority of cases > the vCPU in question then (within a very short period of time) either > triggers a cascade of other vCPU migrations, or begins a series of > ping-pongs between (usually two) pCPU-s - until things settle again for > a while. Again, some minimal latency added here might help avoiding > that. > > Finally, in the complete inverse scenario of severely overcommitted > systems (more than two fully loaded vCPU-s per pCPU) I frequently > see Linux' softlockup watchdog kick in, now and then even resulting > in the VM hanging. I had always thought that starvation of a vCPU > for several seconds shouldn't be an issue that early - am I wrong > here? > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.