Xen project Mailing List

Re: [Xen-devel] PV multiconsole bug during resume.

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Fri, 25 May 2012 09:56:54 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>

Delivery-date: Fri, 25 May 2012 08:58:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 2012-05-24 at 20:37 +0100, Konrad Rzeszutek Wilk wrote: > So .. we used to have in the event.c a spin_lock to protect the > irq_mapping_update_lock, but with git commit > 773659483685d652970583384a0294948e57f8b3 > "xen/irq: Alter the locking to use a mutex instead of a spinlock." > I changed it to a mutex b/c we keept on getting WARNs. > > But now I get this when I resume a PVHVM guest: > > Grant tables using version 2 layout. > BUG: sleeping function called from invalid context at > /home/konrad/ssd/linux/kernel/mutex.c:85 > in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: migration/0 > Pid: 6, comm: migration/0 Tainted: G O > 3.4.0upstream-00113-g598ff45-dirty #1 > Call Trace: > [<ffffffff8109830a>] __might_sleep+0xda/0x100 > [<ffffffff815a47f7>] mutex_lock+0x27/0x50 > [<ffffffff81311ea6>] rebind_evtchn_irq+0x36/0x90 > [<ffffffff81341bfc>] xen_console_resume+0x5c/0x60 > [<ffffffff81313fea>] xen_suspend+0x8a/0xb0 > [<ffffffff810d42f3>] stop_machine_cpu_stop+0xa3/0xf0 > [<ffffffff810d4250>] ? stop_one_cpu_nowait+0x50/0x50 > [<ffffffff810d3f81>] cpu_stopper_thread+0xf1/0x1c0 > [<ffffffff815a5be6>] ? __schedule+0x3c6/0x760 > [<ffffffff815a6bb9>] ? _raw_spin_unlock_irqrestore+0x19/0x30 > [<ffffffff810d3e90>] ? res_counter_charge+0x150/0x150 > [<ffffffff8108e636>] kthread+0x96/0xa0 > [<ffffffff815aeb24>] kernel_thread_helper+0x4/0x10 > [<ffffffff815a7138>] ? retint_restore_args+0x5/0x6 > [<ffffffff815aeb20>] ? gs_change+0x13/0x13 > PM: noirq restore of devices complete after 0.163 msecs > > > Any ideas? xen_console_resume is called from stop_machine context, which has irqs disabled etc and so cannot sleep. One option might be to hoist the call of xen_console_resume out of the stop machine section, e.g. to the same place as xs_resume (which is the only over caller of rebind_evtchn_irq). I'm a bit worried about not getting debug output during resume though, or worse poking the evtchn before it is actually setup again. The alternative is to consider whether irq_mapping_update_lock is even needed in rebind_evtchn_irq. I have a feeling that the lock is a bit pessimistic in the general case (i.e. it covers more than it needs to). Lots of stuff which it might once has covered is actually locked elsewhere these days -- i.e. picking an irq number is handled in the core -- and the old array no longer exists (we use desc->handler_data instead). Once you have picked an irq and an evtchn I'm not sure that the lock when updating the info is even useful any more, so long as the irq/evtchn in question is masked. Anyhow, might be worth having a good look at the use of that lock -- if we can't get rid of it entirely perhaps its scope can be greatly reduced? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.