WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] cpuidle causing Dom0 soft lockups

>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 23.02.10 11:57 >>>
>On 23/02/2010 10:37, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>
>>> Right. According to the code, there should be no way to this BUG_ON.
>>> If it happens, that reveal either bugs of code or the necessity of
>>> adding code to migrate urgent vcpu count. Do you have more
>>> information on how this BUG_ON happens?
>> 
>> Obviously there are vCPU-s that get inserted on a run queue with
>> is_urgent set (which according to my reading of Keir's description
>> shouldn't happen). In particular, this
>
>Is it possible for a polling VCPU to become runnable without it being
>cleared from poll_mask? I suspect maybe that is the problem, and that needs
>dealing with, or the proper handling needs to be added to sched_credit.c.

I don't think that's the case, at least not exclusively. Using

--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -201,6 +201,7 @@ __runq_insert(unsigned int cpu, struct c
 
     BUG_ON( __vcpu_on_runq(svc) );
     BUG_ON( cpu != svc->vcpu->processor );
+WARN_ON(svc->vcpu->is_urgent);//temp
 
     list_for_each( iter, runq )
     {
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -139,6 +139,7 @@ static inline void vcpu_runstate_change(
     ASSERT(spin_is_locked(&per_cpu(schedule_data,v->processor).schedule_lock));
 
     vcpu_urgent_count_update(v);
+WARN_ON(v->is_urgent && new_state <= RUNSTATE_runnable);//temp
 
     trace_runstate_change(v, new_state);
 
I get pairs of warnings (i.e. each for the same vCPU):

(XEN) Xen WARN at schedule.c:142
(XEN) Xen call trace:
(XEN)    [<ffff82c48011c8d5>] schedule+0x375/0x510
(XEN)    [<ffff82c48011deb8>] __do_softirq+0x58/0x80
(XEN)    [<ffff82c4801e61e6>] process_softirqs+0x6/0x10

(XEN) Xen WARN at sched_credit.c:204
(XEN) Xen call trace:
(XEN)    [<ffff82c4801186b9>] csched_vcpu_wake+0x169/0x1a0
(XEN)    [<ffff82c4801497f2>] update_runstate_area+0x102/0x110
(XEN)    [<ffff82c48011cdcf>] vcpu_wake+0x13f/0x390
(XEN)    [<ffff82c48014b1a0>] context_switch+0x760/0xed0
(XEN)    [<ffff82c48014913d>] vcpu_kick+0x1d/0x80
(XEN)    [<ffff82c480107feb>] evtchn_set_pending+0xab/0x1b0
(XEN)    [<ffff82c4801083a9>] evtchn_send+0x129/0x150
(XEN)    [<ffff82c480108950>] do_event_channel_op+0x4c0/0xf50
(XEN)    [<ffff82c4801461b5>] reprogram_timer+0x55/0x90
(XEN)    [<ffff82c4801461b5>] reprogram_timer+0x55/0x90
(XEN)    [<ffff82c48011fd44>] timer_softirq_action+0x1a4/0x360
(XEN)    [<ffff82c4801e6169>] syscall_enter+0xa9/0xae

In schedule() this is always "prev" transitioning to RUNSTATE_runnable
(i.e. _VPF_blocked not set), yet the second call trace shows that
_VPF_blocked must have been set at that point (otherwise
vcpu_unblock(), tail-called from vcpu_kick(), would not have called
vcpu_wake()). If the order wasn't always that shown, or if the two
traces got intermixed, this could hint at a race - but they are always
that way, which so far I cannot make sense of.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel