[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] issues with PLE and/or scheduler.



On 21/12/2011 10:34, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote:

> diff -r 381ab77db71a xen/arch/x86/hvm/vpt.c
> --- a/xen/arch/x86/hvm/vpt.c    Mon Apr 18 10:10:02 2011 +0100
> +++ b/xen/arch/x86/hvm/vpt.c    Thu Dec 22 05:54:54 2011 +0800
> @@ -129,7 +129,7 @@
>      if ( missed_ticks <= 0 )
>          return;
> 
> -    missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
> +    missed_ticks = missed_ticks / (s_time_t) pt->period;
>      if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
>          pt->do_not_freeze = !pt->pending_intr_nr;
>      else
> 
> Anyone can explain the above "plus one" logic ?   why assume at least one tick
> is missed in pt_process_missed_ticks ?

missed_ticks = now - pt->scheduled

pt->scheduled was deadline for next tick. Hence the number of missed ticks
is the total number of period that have passed since pt->scheduled, plus
one. If we had not missed at least one tick, we would return early, from the
if-statement at the top of your patch fragment, above.

What's the guest timer_mode? If there was at least one missed tick I would
have expected a timer interrupt to get injected straight away. Except
perhaps for timer_mode=2=no_missed_ticks_pending -- I don't understand that
timer mode, and perhaps there could be a bad interaction with its specific
interrupt-holdoff logic.

 -- Keir

> In the guest kernel,  ioapic's check_timer logic is used to determine how to
> set IRQ0, and it uses mdelay to delay 10 ticks totally.  If kernel can receive
> 4+ ticks during the delay, kernel deems IRQ0 is routed correctly through
> ioapic.  
> Unfortunately,  mdelay is implemented as a tight pause loop,  when PLE is
> enabled,  the tight pause loop will trigger PLE vmexit.  In the PLE vmexit
> handler, scheduler yields the CPU, but the yield operation triggers  guest's
> time save/restore logic,
> eventually pt_process_missed_ticks gets called.   Once pt_process_missed_ticks
> is called,  pt->scheduled is plused by one pt->period due to the above "plus
> one" logic.   
> By default, ple_window is 4096,  so each 4096 cycles in guest's mdelay
> triggers one  PLE vmexit,  and each vmexit delays  the vpt timer by one
> pt->period, so the vpt timer maybe never be fired during the guest's delay.
> This  is why jiffies is not increased during the 10-tick mdelay.
> 
> Thanks!
> Xiantao 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Shan, Haitao
> Sent: Wednesday, December 21, 2011 9:28 AM
> To: Konrad Rzeszutek Wilk; xen-devel@xxxxxxxxxxxxxxxxxxx;
> konrad.wilk@xxxxxxxxxx; George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx;
> andrew.thomas@xxxxxxxxxx
> Subject: Re: [Xen-devel] issues with PLE and/or scheduler.
> 
> We have reproduced your problem locally and are looking into this issue. It
> seems "PLE with timer mode 2" will trigger the issue. We can post our findings
> as soon as possible.
> 
> Shan Haitao
> 
>> -----Original Message-----
>> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
>> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Konrad Rzeszutek Wilk
>> Sent: Wednesday, December 21, 2011 4:42 AM
>> To: xen-devel@xxxxxxxxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx;
>> George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx; andrew.thomas@xxxxxxxxxx
>> Subject: Re: [Xen-devel] issues with PLE and/or scheduler.
>> 
>> On Tue, Dec 20, 2011 at 04:41:07PM -0400, Konrad Rzeszutek Wilk wrote:
>>> Hey folks,
>>> 
>>> I am sending this on behalf of Andrew since our internal email
>>> system is dropping all xen-devel mailing lists :-(
>> 
>> <hits his head> And I forgot to CC andrew on it. Added here.
>>> 
>>> Anyhow:
>>> 
>>> This is with xen-4.1-testing cs 23201:1c89f7d29fbb and using the
>>> default "credit" scheduler.
>>> 
>>> I've run into an interesting issue with HVM guests which make use of
>>> Pause Loop Exiting (ie. on westmere systems; and also on romley
>>> systems):  after yielding the cpu, guests don't seem to receive
>>> timer interrupts correctly..
>>> 
>>> Some background: for historical reasons (ie old templates) we boot
>>> OL/RHEL guests with the following settings:
>>> 
>>> kernel parameters: clock=pit nohpet nopmtimer
>>> vm.cfg: timer_mode = 2
>>> 
>>> With PLE enabled, 2.6.32 guests will crash early on with:
>>>  ..MP-BIOS bug: 8254 timer not connected to IO-APIC  # a few lines
>>> omitted..
>>>  Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot
>>> with apic=debug
>>> 
>>> While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but
>>> continue and lock up in the serial line initialization.
>>> 
>>>  ..MP-BIOS bug: 8254 timer not connected to IO-APIC  # continues
>>> until lock up here:
>>>  Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing
>>> enabled
>>> 
>>> Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that
>>> jiffies isn't advancing (or only 1 or 2 ticks are being received,
>>> which is insufficient for "working"). This is on a "quiet" system
>>> with
> no
>> other activity.
>>> So, even though the guest has voluntarily yielded the cpu (through
>>> PLE), I would still expect it to receive every clock tick (even with
>>> timer_mode=2) as there is no other work to do on the system.
>>> 
>>> Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an
>>> aside, so does setting ple_gap to 41 (ie prior to
>>> 21355:727ccaaa6cce)
>>> -- the perf counters show no exits happening, so this is equivalent
>>> to disabling PLE.]
>>> 
>>> I'm hoping someone who knows the scheduler well will be able to
>>> quickly decide whether this is a bug or a feature...
>>> 
>>> Andrew
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.