[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] issues with PLE and/or scheduler.



diff -r 381ab77db71a xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c    Mon Apr 18 10:10:02 2011 +0100
+++ b/xen/arch/x86/hvm/vpt.c    Thu Dec 22 05:54:54 2011 +0800
@@ -129,7 +129,7 @@
     if ( missed_ticks <= 0 )
         return;

-    missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
+    missed_ticks = missed_ticks / (s_time_t) pt->period;
     if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
         pt->do_not_freeze = !pt->pending_intr_nr;
     else

Anyone can explain the above "plus one" logic ?   why assume at least one tick 
is missed in pt_process_missed_ticks ?  

In the guest kernel,  ioapic's check_timer logic is used to determine how to 
set IRQ0, and it uses mdelay to delay 10 ticks totally.  If kernel can receive 
4+ ticks during the delay, kernel deems IRQ0 is routed correctly through 
ioapic.  
Unfortunately,  mdelay is implemented as a tight pause loop,  when PLE is 
enabled,  the tight pause loop will trigger PLE vmexit.  In the PLE vmexit 
handler, scheduler yields the CPU, but the yield operation triggers  guest's 
time save/restore logic, 
eventually pt_process_missed_ticks gets called.   Once pt_process_missed_ticks 
is called,  pt->scheduled is plused by one pt->period due to the above "plus 
one" logic.   
By default, ple_window is 4096,  so each 4096 cycles in guest's mdelay  
triggers one  PLE vmexit,  and each vmexit delays  the vpt timer by one 
pt->period, so the vpt timer maybe never be fired during the guest's delay.   
This  is why jiffies is not increased during the 10-tick mdelay.  

Thanks!
Xiantao 





-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Shan, Haitao
Sent: Wednesday, December 21, 2011 9:28 AM
To: Konrad Rzeszutek Wilk; xen-devel@xxxxxxxxxxxxxxxxxxx; 
konrad.wilk@xxxxxxxxxx; George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx; 
andrew.thomas@xxxxxxxxxx
Subject: Re: [Xen-devel] issues with PLE and/or scheduler.

We have reproduced your problem locally and are looking into this issue. It 
seems "PLE with timer mode 2" will trigger the issue. We can post our findings 
as soon as possible.

Shan Haitao

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel- 
> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Konrad Rzeszutek Wilk
> Sent: Wednesday, December 21, 2011 4:42 AM
> To: xen-devel@xxxxxxxxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; 
> George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx; andrew.thomas@xxxxxxxxxx
> Subject: Re: [Xen-devel] issues with PLE and/or scheduler.
> 
> On Tue, Dec 20, 2011 at 04:41:07PM -0400, Konrad Rzeszutek Wilk wrote:
> > Hey folks,
> >
> > I am sending this on behalf of Andrew since our internal email 
> > system is dropping all xen-devel mailing lists :-(
> 
> <hits his head> And I forgot to CC andrew on it. Added here.
> >
> > Anyhow:
> >
> > This is with xen-4.1-testing cs 23201:1c89f7d29fbb and using the 
> > default "credit" scheduler.
> >
> > I've run into an interesting issue with HVM guests which make use of 
> > Pause Loop Exiting (ie. on westmere systems; and also on romley
> > systems):  after yielding the cpu, guests don't seem to receive 
> > timer interrupts correctly..
> >
> > Some background: for historical reasons (ie old templates) we boot 
> > OL/RHEL guests with the following settings:
> >
> > kernel parameters: clock=pit nohpet nopmtimer
> > vm.cfg: timer_mode = 2
> >
> > With PLE enabled, 2.6.32 guests will crash early on with:
> >  ..MP-BIOS bug: 8254 timer not connected to IO-APIC  # a few lines 
> > omitted..
> >  Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot 
> > with apic=debug
> >
> > While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but 
> > continue and lock up in the serial line initialization.
> >
> >  ..MP-BIOS bug: 8254 timer not connected to IO-APIC  # continues 
> > until lock up here:
> >  Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing 
> > enabled
> >
> > Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that 
> > jiffies isn't advancing (or only 1 or 2 ticks are being received, 
> > which is insufficient for "working"). This is on a "quiet" system 
> > with
no
> other activity.
> > So, even though the guest has voluntarily yielded the cpu (through 
> > PLE), I would still expect it to receive every clock tick (even with
> > timer_mode=2) as there is no other work to do on the system.
> >
> > Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an 
> > aside, so does setting ple_gap to 41 (ie prior to 
> > 21355:727ccaaa6cce)
> > -- the perf counters show no exits happening, so this is equivalent 
> > to disabling PLE.]
> >
> > I'm hoping someone who knows the scheduler well will be able to 
> > quickly decide whether this is a bug or a feature...
> >
> > Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.