WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] cpuidle causing Dom0 soft lockups

To: Jan Beulich <JBeulich@xxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Subject: RE: [Xen-devel] cpuidle causing Dom0 soft lockups
From: "Yu, Ke" <ke.yu@xxxxxxxxx>
Date: Wed, 3 Feb 2010 15:32:48 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 02 Feb 2010 23:35:09 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4B58402E020000780002B3FE@xxxxxxxxxxxxxxxxxx> <C77DE51B.6F89%keir.fraser@xxxxxxxxxxxxx> <4B67E85E020000780002D1A0@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acqj3Q0s1oo/kfQzT7a3+g7dJJDr3gAQv8qwACCny6A=
Thread-topic: [Xen-devel] cpuidle causing Dom0 soft lockups
Hi Jan,

Could you please try the attached patch. this patch try to avoid entering deep 
C state when there is vCPU local irq disabled, and polling event channel. When 
tested in my 64 CPU box, this issue is gone with this patch.

Best Regards
Ke

>-----Original Message-----
>From: Yu, Ke
>Sent: Wednesday, February 03, 2010 1:07 AM
>To: Jan Beulich; Keir Fraser
>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: [Xen-devel] cpuidle causing Dom0 soft lockups
>
>>-----Original Message-----
>>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
>>Sent: Tuesday, February 02, 2010 3:55 PM
>>To: Keir Fraser; Yu, Ke
>>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
>>Subject: Re: [Xen-devel] cpuidle causing Dom0 soft lockups
>>
>>>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 21.01.10 12:03 >>>
>>>On 21/01/2010 10:53, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>>>> I can see your point. But how can you consider shipping with something
>>>> apparently severely broken. As said before - the fact that this manifests
>>>> itself by hanging many-vCPU Dom0 has the very likely implication that
>>>> there are (so far unnoticed) problems with smaller Dom0-s. If I had a
>>>> machine at hand that supports C3, I'd try to do some measurements
>>>> with smaller domains...
>>>
>>>Well it's a fallback I guess. If we can't make progress on solving it then I
>>>suppose I agree.
>>
>>Just fyi, we now also have seen an issue on a 24-CPU system that went
>>away with cpuidle=0 (and static analysis of the hang hinted in that
>>direction). All I can judge so far is that this likely has something to do
>>with our kernel's intensive use of the poll hypercall (i.e. we see vCPU-s
>>not waking up from the call despite there being pending unmasked or
>>polled for events).
>>
>>Jan
>
>Hi Jan,
>
>We just identified the cause of this issue, and is trying to find appropriate 
>way
>to fix it.
>
>This issue is the result of following sequence:
>1. every dom0 vCPU has one 250HZ timer (i.e. 4ms period). The vCPU
>timer_interrupt handler will acquire a global ticket spin lock xtime_lock.
>When xtime_lock is hold by other vCPU, the vCPU will poll event channel and
>become blocked. As a result, the pCPU where the vCPU runs will become idle.
>Later, when the lock holder release xtime_lock, it will notify event channel to
>wake up the vCPU. As a result, the pCPU will wake up from idle state, and
>schedule the vCPU to run.
>
>From the above, we can see the latency of vCPU timer interrupt is consisted
>of the following items. The "latency" here means the time between beginning
>to acquire lock and finally lock acquired.
>T1 - CPU execution time ( e.g. timer interrupt lock holding time, event channel
>notification time)
>T2 - CPU idle wake up time, i.e. the time CPU wake up from deep C state (e.g.
>C3) to C0, usually it is in the order of several 10us or 100us
>
>2. then let's consider the case of large number of CPUs, e.g. 64 pCPU and 64
>VCPU in dom0, let's assume the lock holding sequence is VCPU0 ->
>VCPU1->VCPU2 ... ->VCPU63.
>Then vCPU63 will spend 64*(T1 + T2) to acquire the xtime_lock. if T1+T2 is
>100us, then the total latency would be ~6ms.
>As we have known that the timer is 250HZ, or 4ms period, so when event
>channel notification issued, and pCPU schedule vCPU63, hypervisor will find
>the timer is over-due, and will send another TIMER_VIRQ for vCPU63 (see
>schedule()->vcpu_periodic_timer_work() for detail). In this case, vCPU63 will
>be always busy handling timer interrupt, and not be able to update the watch
>dog, thus cause the softlock up.
>
>So from the above sequence, we can see:
>- cpuidle driver add extra latency, thus make this issue more easy to occurs.
>- Large number of CPU multiply the latency
>- ticket spin lock lead fixed lock acquiring sequence, thus lead the latency
>repeatedly being 64*(T1+T2), thus make this issue more easy to occurs.
>and the fundamental cause of this issue is that vCPU timer interrupt handler
>is not good for scaling, due to the global xtime_lock.
>
>From cpuidle point of view, one thing we are trying to do is: changing the
>cpuidle driver to not enter deep C state when there is vCPU with local irq
>disabled and event channel polling. In this case, the T2 latency will be
>eliminated.
>
>Anyway, cpuidle is just one side, we can anticipate that if CPU number is large
>enough to lead NR_CPU * T1 > 4ms, this issue will occurs again. So another
>way is to make dom0 scaling well by not using xtime_lock, although this is
>pretty hard currently. Or another way is to limit dom0 vCPU number to
>certain reasonable level.
>
>Regards
>Ke

Attachment: cpuidle-hint-v2.patch
Description: cpuidle-hint-v2.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel