Xen project Mailing List

Re: [Xen-devel] cpuidle causing Dom0 soft lockups

To: "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, "Ke Yu" <ke.yu@xxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxxxx>

Date: Tue, 23 Feb 2010 16:44:59 +0000

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 23 Feb 2010 08:45:26 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 23.02.10 11:57 >>> >On 23/02/2010 10:37, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote: > >>> Right. According to the code, there should be no way to this BUG_ON. >>> If it happens, that reveal either bugs of code or the necessity of >>> adding code to migrate urgent vcpu count. Do you have more >>> information on how this BUG_ON happens? >> >> Obviously there are vCPU-s that get inserted on a run queue with >> is_urgent set (which according to my reading of Keir's description >> shouldn't happen). In particular, this > >Is it possible for a polling VCPU to become runnable without it being >cleared from poll_mask? I suspect maybe that is the problem, and that needs >dealing with, or the proper handling needs to be added to sched_credit.c. I don't think that's the case, at least not exclusively. Using --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -201,6 +201,7 @@ __runq_insert(unsigned int cpu, struct c BUG_ON( __vcpu_on_runq(svc) ); BUG_ON( cpu != svc->vcpu->processor ); +WARN_ON(svc->vcpu->is_urgent);//temp list_for_each( iter, runq ) { --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -139,6 +139,7 @@ static inline void vcpu_runstate_change( ASSERT(spin_is_locked(&per_cpu(schedule_data,v->processor).schedule_lock)); vcpu_urgent_count_update(v); +WARN_ON(v->is_urgent && new_state <= RUNSTATE_runnable);//temp trace_runstate_change(v, new_state); I get pairs of warnings (i.e. each for the same vCPU): (XEN) Xen WARN at schedule.c:142 (XEN) Xen call trace: (XEN) [<ffff82c48011c8d5>] schedule+0x375/0x510 (XEN) [<ffff82c48011deb8>] __do_softirq+0x58/0x80 (XEN) [<ffff82c4801e61e6>] process_softirqs+0x6/0x10 (XEN) Xen WARN at sched_credit.c:204 (XEN) Xen call trace: (XEN) [<ffff82c4801186b9>] csched_vcpu_wake+0x169/0x1a0 (XEN) [<ffff82c4801497f2>] update_runstate_area+0x102/0x110 (XEN) [<ffff82c48011cdcf>] vcpu_wake+0x13f/0x390 (XEN) [<ffff82c48014b1a0>] context_switch+0x760/0xed0 (XEN) [<ffff82c48014913d>] vcpu_kick+0x1d/0x80 (XEN) [<ffff82c480107feb>] evtchn_set_pending+0xab/0x1b0 (XEN) [<ffff82c4801083a9>] evtchn_send+0x129/0x150 (XEN) [<ffff82c480108950>] do_event_channel_op+0x4c0/0xf50 (XEN) [<ffff82c4801461b5>] reprogram_timer+0x55/0x90 (XEN) [<ffff82c4801461b5>] reprogram_timer+0x55/0x90 (XEN) [<ffff82c48011fd44>] timer_softirq_action+0x1a4/0x360 (XEN) [<ffff82c4801e6169>] syscall_enter+0xa9/0xae In schedule() this is always "prev" transitioning to RUNSTATE_runnable (i.e. _VPF_blocked not set), yet the second call trace shows that _VPF_blocked must have been set at that point (otherwise vcpu_unblock(), tail-called from vcpu_kick(), would not have called vcpu_wake()). If the order wasn't always that shown, or if the two traces got intermixed, this could hint at a race - but they are always that way, which so far I cannot make sense of. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.