WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] planned csched improvements?

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: Re: [Xen-devel] planned csched improvements?
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Fri, 9 Oct 2009 16:59:25 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 09 Oct 2009 08:59:52 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=Ud7VKCjdR8eldoC9tunx++34EHauBEoJGMzGnqnO/J4=; b=mPakoXxwkprWyXdCMur1Zz2UQtnbLqSS5/DxP3mQGSNkbVFYel9MeTmfw9f5iyQuwk rKK+doYK25ez/8kQxY3t93LXtrXNnLEpLiY+tchgkXemTK/YQ7Fc1qJ8Bj246lyJwOrD /ZxjKRr6bIVwnn8/QOWJr54U4Oq7pwDJYoPDA=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=RNJ+YkRuatAKq0ip5OtZ17XnkYkw9nNIQtWIALfzsKYZRKkUAacKEp+zhOv/G/KJlf M6RhEKjk0N2qJr1VLD57bNk+wxw6wrO5ZsfpEtV932v0oOWpeTAhII8DBBp25SrDw20G V42qh6Q8LWbpKmcWN7Vw9p4wFCeu1lku2/H9s=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4ACF6A8F02000078000190E2@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4ACF6A8F02000078000190E2@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Fri, Oct 9, 2009 at 3:53 PM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
> After the original announcement of plans to do some work on csched there
> wasn't much activity, so I'd like to ask about some observations that I made
> with the current implementation and whether it would be expected that
> those planned changes would take care of them.

There has been activity, but nothing worth sharing yet. :-)  I'm
working on the new "fairness" algorithm (perhaps called credits,
perhaps not), which is a prerequisite for any further work such as
load-balancing, power consumption, and so on.  Unfortunately, I
haven't been able to work on it for more than a week at a time for the
last several months before being interrupted with other work-related
tasks. :-(

Re the items you bring up below: I believe that my planned changes to
load-balancing should address the first.  First, I plan on making all
cores which share an L2 cache to share a runqueue.  This will
automatically share work among those cores without needing any special
load-balancing to be done.  Then, I plan on actually calculating:
* The per-runqueue load over the last time period
* The amount each vcpu is contributing to that load.
Then load balancing won't be a matter of looking at the instantaneous
runqueue lengths (as it is currently) but to the actual amount of
"business" the runqueue has over a period of time.  Load balancing
will be just that: actually moving vcpus around to make the loads more
balanced.  Balancing operations will happen at fixed intervals, rather
than "whenever a runqueue is idle".

But those are just plans now; not a line of code has been written, and
schedulers especially are notorious for the Law of Unexpected
Consequences.

Re soft-lockups: That really shouldn't be possible with the current
scheduler; if it happens, it's a bug.  Have you pulled from
xen-unstable recently?  There was a bug introduced a few weeks ago
that would cause problems; Keir checked in a fix for that one last
week.  Otherwise, if you're sure it's not a long hypercall issue,
there must be a bug somewhere.

The new scheduler will be an almost complete re-write; so it will
probably erase this bug, and introduce its own bugs.  However, I doubt
it will be ready by 3.5, so it's probably worth tracking down and
fixing if we can.

Hope that answers your question. :-)

 -George


> On a lightly loaded many-core non-hyperthreaded system (e.g. a single
> CPU bound process in one VM, and only some background load elsewhere),
> I see this CPU bound vCPU permanently switch between sockets, which is
> a result of csched_cpu_pick() eagerly moving vCPU-s to "more idle"
> sockets. It would seem that some minimal latency consideration might be
> useful to get added here, so that a very brief interruption by another
> vCPU doesn't result in unnecessary migration.
>
> As a consequence of that eager moving, in the vast majority of cases
> the vCPU in question then (within a very short period of time) either
> triggers a cascade of other vCPU migrations, or begins a series of
> ping-pongs between (usually two) pCPU-s - until things settle again for
> a while. Again, some minimal latency added here might help avoiding
> that.
>
> Finally, in the complete inverse scenario of severely overcommitted
> systems (more than two fully loaded vCPU-s per pCPU) I frequently
> see Linux' softlockup watchdog kick in, now and then even resulting
> in the VM hanging. I had always thought that starvation of a vCPU
> for several seconds shouldn't be an issue that early - am I wrong
> here?
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel