WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Re: Xen Scheduler: Credit Scheduler ?

It's a bit frustrating that we're not making progess isolating
the problem here.

We still don't have any concrete evidence showing that the
benchmark user processes or domain VCPU are or aren't runnable
when you notice the "stall".

It's also not clear what is the simplest scenario under which
the problem can be reproduced. I tried reading your explanation
about what happens with UP guests but I can't understand it.
Can you clarify exactly what you are doing and in what order?

I threw around some ideas to get more data points and help
debug this:
- run spinners on the guest that consume CPU
    "int main() { while (1); return 0; }" and run X copies (for X vcpus)
- take scheduler traces (man xentrace)

You could also monitor the benchmark from inside the guest using
a variety of means to check if its processes are blocked for any
reason and why.

You need to isolate the problem further. We just don't have
anything to go on right now to even say if this is a Xen problem
or not and much less a scheduler or other type of issue.


Emmanuel.


On Wed, Nov 22, 2006 at 01:29:04PM -0500, Ott, Donna E wrote:
>  
> > You need to find out if the VCPUs are blocked in the kernel 
> > or runnable but not being scheduled.
> > 
> > The easiest way to do this is to run 2 spinner processes in 
> > the guest after it "stalls".
> 
> Well, I did find that if I wait a bit and then hit "ctl-C" and/or type
> into the "stalled" domain, it
> Will start up again but will never get much CPU time relative to what it
> "had" it will
> Then complete the benchmark - but with errors- obviously.
> > 
> > That will tell you if it's the application that has stalled 
> > or if it's the guest OS that's runnable but not getting any CPU time.
> 
> Could you explain the details?
> 
> > 
> > Running 3 competing 2vcpu guests on a 2cpu host may cause 
> > some interesting problems because while the OS is written to 
> > assume that its physical CPUs all exist at the same time, the 
> > same is not necessarly true in a virtual environment.
> > Your guest OS or benchmark could be timing out due to time 
> > outs on spinlocks or something like that.
> I have now run them as 1cpu guests as well. (Once again I think it
> unlikely that my 
> Benchmark is timing out, etc. it's a well known, well used, even by me,
> and has NEVER
> Behaved this way on other Os's or virtualization software. (that said
> anything is possible in software/hw land!!))
> 
> > 
> > The way to make progress on this is:
> > 1- verify that if your vcpus are runnable they run: do this
> >    by running spinners on top of ur benchmark or once the
> >    benchmark stalls.
> Not sure what you mean by this- once the benchmark stalls- it is still
> there and typing
> Into the domain will make it start  to run again- sort of right where it
> had "paused".
> 
> 
> > 2- verify that the problem goes away with single CPU guests.
> It does NOT go away with single cpu guests- shockingly- it can even
> occur with a single
> Guest and a large load- say "xm create newguest" - will stall out the
> "single guest"
> 
> It is particularly easy to see on the first run with the three guests-
> or even two.
> Just create them,set up the bm, run it in each guest (by hand) and in
> moments  a "stall"
> Will occur. After the first run, it is harder to get to happen. But the
> first time is fairly repeatable.
> 
> Though, it does seem to be less prevalent with single guests but it can
> STILL happen.
> 
> 
> > 3- collect scheduler traces on all CPUs.
> Ok, please explain how to do this. I am running out of time to debug
> this.
> I may soon have to leave this as it is and just go with the results I
> have (sadly as I am so impressed with it when it runs well.)
> 
> 
> > 
> > In general, the best way to deal with SMP guests which have 
> > less CPU resources than their number of VCPUs is to "fold"
> > the guest down using the CPU hotplug mechanism. There are 
> > other alternatives as well that we can look at. Before we do 
> > so, let's try to reduce this problem a bit so we can verify 
> > if this is or isn't a virtual SMP issue.
> 
> Sounds great to me- hope this latest data is helpful. I wish I had more
> time!
> Cheers
> Donna "thankful for what I found that worked well" Ott

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users