Re: [Xen-devel] Calculating real cpu usage of Xen domains correctly!

(Augh, sourceforge mail is driving me nuts; it lost an email exchange 
between Anthony and myself earlier & seems to have lost the included 
message below; who knows what other valuable messages were lost today? 
XenSource: get the new mail reflector going!)

Anothony, here are some ideas:

- Change the default scheduler in schedule.c to rrobin and/or atropos, to 
see if they exhibit the same problem.

- Create a new global variable "cputime_total" and keep track of the sum 
of all time intervals to make sure the value is sane:

    prev->cpu_time += now - prev->lastschd;
    cputime_total  += now - prev->lastschd;

- Lie about the value of "now - prev->lastschd": make it a constant 1ms 
per invocation of __enter_scheduler(), and use that to count the number of 
times each domain gets scheduled.

- Make "lastschd" a global variable, to test if the "prev" structure is 
getting overwritten somehow.

- Make the return type of get_s_time() and its child calls [in time.c] 
volatile, to make sure the return value of NOW() isn't getting 
unnecessarily cached.

- In lieu of printk()s on each scheduler entry, you could allocate a few 
pages of memory, use a signal to fill them up with timestamps & the 
results of the ops.do_schedule(now) call during your experiment, then 
printk() the pages out postmortem.


Anthony Liguori wrote:

> John L Griffin wrote:
> >However, I'm concerned that we're missing something bigger.  This is my 

> >understanding of what the BLOCKED flag (and the surrounding code) 
> > 
> >
> You may be correct here.  The thing that leads me to believe that is the 

> following.  When I first start up domain-0, with no domain-U's running, 
> the numbers for domain-0 seem right.  Domain-0's usage jumps up to 100% 
> after I create a domain-U.  Once I've reached this point, it pretty much 

> stays that way.
> It makes me think someone something's triggering this behavior.
> >Which makes me wonder if something is seriously misbehaving to cause 
> >weird CPU usage totals you're seeing -- like a yield()ed or block()ed 
> > 
> >
> Do you have any ideas (or anyone else for that matter) on how to 
> approach this?  I'm afraid the impact of putting printk's in there would 

> be too great.  How does one typically debug scheduler issues?
> I'm willing to spend some cycles looking into this.
> Regards,
> Anthony Liguori

