xen-devel

[Top] [All Lists]

Re: [Xen-devel] cpuidle causing Dom0 soft lockups

from [Jan Beulich]

[Permanent Link][Original]

To:	"Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, "Ke Yu" <ke.yu@xxxxxxxxx>
Subject:	Re: [Xen-devel] cpuidle causing Dom0 soft lockups
From:	"Jan Beulich" <JBeulich@xxxxxxxxxx>
Date:	Tue, 23 Feb 2010 16:44:59 +0000
Cc:	Kevin Tian <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Tue, 23 Feb 2010 08:45:26 -0800
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<C7A9650E.AF75%keir.fraser@xxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<4B83BE1302000078000309B2@xxxxxxxxxxxxxxxxxx> <C7A9650E.AF75%keir.fraser@xxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 23.02.10 11:57 >>>
>On 23/02/2010 10:37, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>
>>> Right. According to the code, there should be no way to this BUG_ON.
>>> If it happens, that reveal either bugs of code or the necessity of
>>> adding code to migrate urgent vcpu count. Do you have more
>>> information on how this BUG_ON happens?
>> 
>> Obviously there are vCPU-s that get inserted on a run queue with
>> is_urgent set (which according to my reading of Keir's description
>> shouldn't happen). In particular, this
>
>Is it possible for a polling VCPU to become runnable without it being
>cleared from poll_mask? I suspect maybe that is the problem, and that needs
>dealing with, or the proper handling needs to be added to sched_credit.c.

I don't think that's the case, at least not exclusively. Using

--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -201,6 +201,7 @@ __runq_insert(unsigned int cpu, struct c
 
     BUG_ON( __vcpu_on_runq(svc) );
     BUG_ON( cpu != svc->vcpu->processor );
+WARN_ON(svc->vcpu->is_urgent);//temp
 
     list_for_each( iter, runq )
     {
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -139,6 +139,7 @@ static inline void vcpu_runstate_change(
     ASSERT(spin_is_locked(&per_cpu(schedule_data,v->processor).schedule_lock));
 
     vcpu_urgent_count_update(v);
+WARN_ON(v->is_urgent && new_state <= RUNSTATE_runnable);//temp
 
     trace_runstate_change(v, new_state);
 
I get pairs of warnings (i.e. each for the same vCPU):

(XEN) Xen WARN at schedule.c:142
(XEN) Xen call trace:
(XEN)    [<ffff82c48011c8d5>] schedule+0x375/0x510
(XEN)    [<ffff82c48011deb8>] __do_softirq+0x58/0x80
(XEN)    [<ffff82c4801e61e6>] process_softirqs+0x6/0x10

(XEN) Xen WARN at sched_credit.c:204
(XEN) Xen call trace:
(XEN)    [<ffff82c4801186b9>] csched_vcpu_wake+0x169/0x1a0
(XEN)    [<ffff82c4801497f2>] update_runstate_area+0x102/0x110
(XEN)    [<ffff82c48011cdcf>] vcpu_wake+0x13f/0x390
(XEN)    [<ffff82c48014b1a0>] context_switch+0x760/0xed0
(XEN)    [<ffff82c48014913d>] vcpu_kick+0x1d/0x80
(XEN)    [<ffff82c480107feb>] evtchn_set_pending+0xab/0x1b0
(XEN)    [<ffff82c4801083a9>] evtchn_send+0x129/0x150
(XEN)    [<ffff82c480108950>] do_event_channel_op+0x4c0/0xf50
(XEN)    [<ffff82c4801461b5>] reprogram_timer+0x55/0x90
(XEN)    [<ffff82c4801461b5>] reprogram_timer+0x55/0x90
(XEN)    [<ffff82c48011fd44>] timer_softirq_action+0x1a4/0x360
(XEN)    [<ffff82c4801e6169>] syscall_enter+0xa9/0xae

In schedule() this is always "prev" transitioning to RUNSTATE_runnable
(i.e. _VPF_blocked not set), yet the second call trace shows that
_VPF_blocked must have been set at that point (otherwise
vcpu_unblock(), tail-called from vcpu_kick(), would not have called
vcpu_wake()). If the order wasn't always that shown, or if the two
traces got intermixed, this could hint at a race - but they are always
that way, which so far I cannot make sense of.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

[More with this subject...]

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [Xen-devel] cpuidle causing Dom0 soft lockups, (continued) Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Keir Fraser RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Yu, Ke Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Jan Beulich RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Yu, Ke RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Jan Beulich Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Jan Beulich Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Keir Fraser RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Yu, Ke RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Jan Beulich Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Keir Fraser Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Jan Beulich <= RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Tian, Kevin RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Yu, Ke RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Jan Beulich

Previous by Date:	[Xen-devel] [PATCH] gnttab: propagate Reserved flag from old to new page in gnttab_copy_grant_page., Ian Campbell
Next by Date:	[Xen-devel] [GIT] netback fixes from XCP kernel tree, Ian Campbell
Previous by Thread:	Re: [Xen-devel] cpuidle causing Dom0 soft lockups, Keir Fraser
Next by Thread:	RE: [Xen-devel] cpuidle causing Dom0 soft lockups, Tian, Kevin
Indexes:	[Date] [Thread] [Top] [All Lists]