[Xen-devel] credit scheduler error rates as reported by HP and U

To:	Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, Emmanuel Ackaouy <ackaouy@xxxxxxxxx>, lucy.cherkasova@xxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject:	[Xen-devel] credit scheduler error rates as reported by HP and UCSD
From:	"Mike D. Day" <ncmike@xxxxxxxxxx>
Date:	Thu, 12 Apr 2007 11:16:38 -0400
Delivery-date:	Thu, 12 Apr 2007 08:15:28 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization:	IBM Linux Technology Center
Reply-to:	ncmike@xxxxxxxxxx
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	Mutt/1.5.13 (2006-08-11)

I've been looking at the credit scheduler in light of the paper"Resource Allocation Challenges in Virtual Machine Based ITEnvironments."

http://www.hpl.hp.com/techreports/2007/HPL-2007-25.pdf

I've got an observation and three questions.

My first observation is that the credit scheduler will select a vcpu
that has exceeded its credit when there is no other work to be done on
any of the other physical cpus in the system.

You can verify this by looking at the last couple of lines of the
function csched_load_balance in xen/common/sched_credit.c:

   /* Failed to find more important work elsewhere... */
   __runq_remove(snext);
   return snext;

where snext is the vcpu that is over its credit for the current time

slice.

So now a question: Is this the expected or desired behaviour of the
credit scheduler? I would assume so. Why idle vcpu when there is no
contention for resources and work to be done by that vcpu?

In light of the paper, with very low allocation targets for vcpus, it
is not surprising that the positive allocation errors can be quite
large. It is also not surprising that the errors (and error

distribution) decrease with larger allocation targets.

None of this explains the negative allocation errors, where the vcpu's
received less than their pcpu allotments. I speculate that a couple of
circumstances may contribute to negative allocation errors:

very low weights attached to domains will cause the credit scheduler
to attempt to pause vcpus almost every accounting cycle. vcpus may
therefore not have as many opportunities to run as frequently as
possible. If the ALERT measument method is different, or has a
different interval, than the credit schedulers 10ms tick and 30ms

accounting cycle, negative errors may result in the view of ALERT.

I/O activity: if ALERT performans I/O activity the test, even though
it is "cpu intensive" may cause domu to block on dom0 frequently,
meaning it will idle more, especially if dom0 has a low credit
allocation.

Questions: how does ALERT measure actual cpu allocation? Using Xenmon?
How does the ALERT exersize the domain? The paper didn't mention the
actual system calls and hypercalls the domains are making when running
ALERT.

thanks,

Mike

--
Mike D. Day
Virtualization Architect and Sr. Technical Staff Member, IBM LTC
Cell: 919 412-3900
ST: mdday@xxxxxxxxxx | AIM: ncmikeday | Yahoo IM: ultra.runner
PGP key: http://www.ncultra.org/ncmike/pubkey.asc

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] credit scheduler error rates as reported by HP and UCSD