Xen project Mailing List

Re: [Xen-devel] PV multiconsole bug during resume.

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Date: Fri, 25 May 2012 10:31:25 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Fri, 25 May 2012 09:32:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, 25 May 2012, Ian Campbell wrote: > On Thu, 2012-05-24 at 20:37 +0100, Konrad Rzeszutek Wilk wrote: > > So .. we used to have in the event.c a spin_lock to protect the > > irq_mapping_update_lock, but with git commit > > 773659483685d652970583384a0294948e57f8b3 > > "xen/irq: Alter the locking to use a mutex instead of a spinlock." > > I changed it to a mutex b/c we keept on getting WARNs. > > > > But now I get this when I resume a PVHVM guest: > > > > Grant tables using version 2 layout. > > BUG: sleeping function called from invalid context at > > /home/konrad/ssd/linux/kernel/mutex.c:85 > > in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: migration/0 > > Pid: 6, comm: migration/0 Tainted: G O > > 3.4.0upstream-00113-g598ff45-dirty #1 > > Call Trace: > > [<ffffffff8109830a>] __might_sleep+0xda/0x100 > > [<ffffffff815a47f7>] mutex_lock+0x27/0x50 > > [<ffffffff81311ea6>] rebind_evtchn_irq+0x36/0x90 > > [<ffffffff81341bfc>] xen_console_resume+0x5c/0x60 > > [<ffffffff81313fea>] xen_suspend+0x8a/0xb0 > > [<ffffffff810d42f3>] stop_machine_cpu_stop+0xa3/0xf0 > > [<ffffffff810d4250>] ? stop_one_cpu_nowait+0x50/0x50 > > [<ffffffff810d3f81>] cpu_stopper_thread+0xf1/0x1c0 > > [<ffffffff815a5be6>] ? __schedule+0x3c6/0x760 > > [<ffffffff815a6bb9>] ? _raw_spin_unlock_irqrestore+0x19/0x30 > > [<ffffffff810d3e90>] ? res_counter_charge+0x150/0x150 > > [<ffffffff8108e636>] kthread+0x96/0xa0 > > [<ffffffff815aeb24>] kernel_thread_helper+0x4/0x10 > > [<ffffffff815a7138>] ? retint_restore_args+0x5/0x6 > > [<ffffffff815aeb20>] ? gs_change+0x13/0x13 > > PM: noirq restore of devices complete after 0.163 msecs > > > > > > Any ideas? > > xen_console_resume is called from stop_machine context, which has irqs > disabled etc and so cannot sleep. > > One option might be to hoist the call of xen_console_resume out of the > stop machine section, e.g. to the same place as xs_resume (which is the > only over caller of rebind_evtchn_irq). > > I'm a bit worried about not getting debug output during resume though, > or worse poking the evtchn before it is actually setup again. > > The alternative is to consider whether irq_mapping_update_lock is even > needed in rebind_evtchn_irq. I have a feeling that the lock is a bit > pessimistic in the general case (i.e. it covers more than it needs to). > Lots of stuff which it might once has covered is actually locked > elsewhere these days -- i.e. picking an irq number is handled in the > core -- and the old array no longer exists (we use desc->handler_data > instead). Once you have picked an irq and an evtchn I'm not sure that > the lock when updating the info is even useful any more, so long as the > irq/evtchn in question is masked. Anyhow, might be worth having a good > look at the use of that lock -- if we can't get rid of it entirely > perhaps its scope can be greatly reduced? Considering that xen_console_resume is called by stop_machine, there is certainly no need to protect rebind_evtchn_irq with a mutex at that point. However rebind_evtchn_irq is also called by xs_resume (xb_init_comms), outside of stop_machine, so we might actually need a mutex there. Maybe we need a rebind_evtchn_irqsafe function that doesn't take the mutex and can be called from irq context. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.