[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] issues with PLE and/or scheduler.



05:54:54 2011 +0800
> > @@ -129,7 +129,7 @@
> >      if ( missed_ticks <= 0 )
> >          return;
> >
> > -    missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
> > +    missed_ticks = missed_ticks / (s_time_t) pt->period;
> >      if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
> >          pt->do_not_freeze = !pt->pending_intr_nr;
> >      else
> >
> > Anyone can explain the above "plus one" logic ?   why assume at least one
> tick
> > is missed in pt_process_missed_ticks ?
> 
> missed_ticks = now - pt->scheduled
> 
> pt->scheduled was deadline for next tick. Hence the number of missed
> pt->ticks
> is the total number of period that have passed since pt->scheduled, plus one.
> If we had not missed at least one tick, we would return early, from the if-
> statement at the top of your patch fragment, above.
> What's the guest timer_mode? If there was at least one missed tick I would
> have expected a timer interrupt to get injected straight away. Except
> perhaps for timer_mode=2=no_missed_ticks_pending -- I don't understand
> that timer mode, and perhaps there could be a bad interaction with its
> specific interrupt-holdoff logic.

Agree, in the if-statement, pt->do_not_freeze is set 1 if pt->pending_intr_nr 
==0, so the timer is not stopped in next round's pt_save_timer.  In this case, 
we expect the timer can be fired after expiration. However, even if expiration 
occurs, the timer maybe not fired (because timer handler is executed in late 
softirq).  In this case, pt_process_missd_tick may see one missed tick and then 
delay this timer by one period, but pt->pending_intr_nr is not increased due to 
not-fired timer.  For example, in yield operation, due to short runq, scheduler 
may always select the previous vcpu as the next vcpu to run after yielding, so 
between pt_save_timer and pt_restore_timer, even if expiration occurs, it has 
no chance to fire the timer.    However, pt_restore_timer may call 
pt_process_missed_tick to re-calculate pt->scheduled which will be plused one 
pt->period once expiration occurs, then the timer is delayed without increasing 
pt->pending_intr_nr.  
I think the below patch can fix this issue.  if the timer is not stopped( in 
save logic )and not fired yet( in restore logic), we should keep the timer 
running until it is fired in softirq, and we shouldn't delay it through 
re-calculating its expiration time, because this may lead to guest's lost timer 
interrupts in some cases.     

diff -r 381ab77db71a xen/arch/x86/hvm/vpt.c
--- a/xen/arch/x86/hvm/vpt.c    Mon Apr 18 10:10:02 2011 +0100
+++ b/xen/arch/x86/hvm/vpt.c    Thu Dec 22 11:35:36 2011 +0800
@@ -185,7 +185,7 @@

     list_for_each_entry ( pt, head, list )
     {
-        if ( pt->pending_intr_nr == 0 )
+        if ( pt->pending_intr_nr == 0 && !pt->do_not_freeze)
         {
             pt_process_missed_ticks(pt);
             set_timer(&pt->timer, pt->scheduled);


> > We have reproduced your problem locally and are looking into this
> > issue. It seems "PLE with timer mode 2" will trigger the issue. We can
> > post our findings as soon as possible.
> 
> > Shan Haitao
> >
> >> -----Original Message-----
> >> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
> >> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Konrad Rzeszutek Wilk
> >> Sent: Wednesday, December 21, 2011 4:42 AM
> >> To: xen-devel@xxxxxxxxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx;
> >> George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx;
> andrew.thomas@xxxxxxxxxx
> >> Subject: Re: [Xen-devel] issues with PLE and/or scheduler.
> >>
> >> On Tue, Dec 20, 2011 at 04:41:07PM -0400, Konrad Rzeszutek Wilk wrote:
> >>> Hey folks,
> >>>
> >>> I am sending this on behalf of Andrew since our internal email
> >>> system is dropping all xen-devel mailing lists :-(
> >>
> >> <hits his head> And I forgot to CC andrew on it. Added here.
> >>>
> >>> Anyhow:
> >>>
> >>> This is with xen-4.1-testing cs 23201:1c89f7d29fbb and using the
> >>> default "credit" scheduler.
> >>>
> >>> I've run into an interesting issue with HVM guests which make use of
> >>> Pause Loop Exiting (ie. on westmere systems; and also on romley
> >>> systems):  after yielding the cpu, guests don't seem to receive
> >>> timer interrupts correctly..
> >>>
> >>> Some background: for historical reasons (ie old templates) we boot
> >>> OL/RHEL guests with the following settings:
> >>>
> >>> kernel parameters: clock=pit nohpet nopmtimer
> >>> vm.cfg: timer_mode = 2
> >>>
> >>> With PLE enabled, 2.6.32 guests will crash early on with:
> >>>  ..MP-BIOS bug: 8254 timer not connected to IO-APIC  # a few lines
> >>> omitted..
> >>>  Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot
> >>> with apic=debug
> >>>
> >>> While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but
> >>> continue and lock up in the serial line initialization.
> >>>
> >>>  ..MP-BIOS bug: 8254 timer not connected to IO-APIC  # continues
> >>> until lock up here:
> >>>  Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing
> >>> enabled
> >>>
> >>> Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that
> >>> jiffies isn't advancing (or only 1 or 2 ticks are being received,
> >>> which is insufficient for "working"). This is on a "quiet" system
> >>> with
> > no
> >> other activity.
> >>> So, even though the guest has voluntarily yielded the cpu (through
> >>> PLE), I would still expect it to receive every clock tick (even with
> >>> timer_mode=2) as there is no other work to do on the system.
> >>>
> >>> Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an
> >>> aside, so does setting ple_gap to 41 (ie prior to
> >>> 21355:727ccaaa6cce)
> >>> -- the perf counters show no exits happening, so this is equivalent
> >>> to disabling PLE.]
> >>>
> >>> I'm hoping someone who knows the scheduler well will be able to
> >>> quickly decide whether this is a bug or a feature...
> >>>
> >>> Andrew
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.