[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV multiconsole bug during resume.



On Fri, 25 May 2012, Ian Campbell wrote:
> On Thu, 2012-05-24 at 20:37 +0100, Konrad Rzeszutek Wilk wrote:
> > So .. we used to have in the event.c a spin_lock to protect the
> > irq_mapping_update_lock, but with git commit  
> > 773659483685d652970583384a0294948e57f8b3
> > "xen/irq: Alter the locking to use a mutex instead of a spinlock."
> > I changed it to a mutex b/c we keept on getting WARNs.
> > 
> > But now I get this when I resume a PVHVM guest:
> > 
> > Grant tables using version 2 layout.
> > BUG: sleeping function called from invalid context at 
> > /home/konrad/ssd/linux/kernel/mutex.c:85
> > in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: migration/0
> > Pid: 6, comm: migration/0 Tainted: G           O 
> > 3.4.0upstream-00113-g598ff45-dirty #1
> > Call Trace:
> >  [<ffffffff8109830a>] __might_sleep+0xda/0x100
> >  [<ffffffff815a47f7>] mutex_lock+0x27/0x50
> >  [<ffffffff81311ea6>] rebind_evtchn_irq+0x36/0x90
> >  [<ffffffff81341bfc>] xen_console_resume+0x5c/0x60
> >  [<ffffffff81313fea>] xen_suspend+0x8a/0xb0
> >  [<ffffffff810d42f3>] stop_machine_cpu_stop+0xa3/0xf0
> >  [<ffffffff810d4250>] ? stop_one_cpu_nowait+0x50/0x50
> >  [<ffffffff810d3f81>] cpu_stopper_thread+0xf1/0x1c0
> >  [<ffffffff815a5be6>] ? __schedule+0x3c6/0x760
> >  [<ffffffff815a6bb9>] ? _raw_spin_unlock_irqrestore+0x19/0x30
> >  [<ffffffff810d3e90>] ? res_counter_charge+0x150/0x150
> >  [<ffffffff8108e636>] kthread+0x96/0xa0
> >  [<ffffffff815aeb24>] kernel_thread_helper+0x4/0x10
> >  [<ffffffff815a7138>] ? retint_restore_args+0x5/0x6
> >  [<ffffffff815aeb20>] ? gs_change+0x13/0x13
> > PM: noirq restore of devices complete after 0.163 msecs
> > 
> > 
> > Any ideas?
> 
> xen_console_resume is called from stop_machine context, which has irqs
> disabled etc and so cannot sleep.
> 
> One option might be to hoist the call of xen_console_resume out of the
> stop machine section, e.g. to the same place as xs_resume (which is the
> only over caller of rebind_evtchn_irq).
> 
> I'm a bit worried about not getting debug output during resume though,
> or worse poking the evtchn before it is actually setup again.
> 
> The alternative is to consider whether irq_mapping_update_lock is even
> needed in rebind_evtchn_irq. I have a feeling that the lock is a bit
> pessimistic in the general case (i.e. it covers more than it needs to).
> Lots of stuff which it might once has covered is actually locked
> elsewhere these days -- i.e. picking an irq number is handled in the
> core -- and the old array no longer exists (we use desc->handler_data
> instead). Once you have picked an irq and an evtchn I'm not sure that
> the lock when updating the info is even useful any more, so long as the
> irq/evtchn in question is masked. Anyhow, might be worth having a good
> look at the use of that lock -- if we can't get rid of it entirely
> perhaps its scope can be greatly reduced?

Considering that xen_console_resume is called by stop_machine, there is
certainly no need to protect rebind_evtchn_irq with a mutex at that
point.
However rebind_evtchn_irq is also called by xs_resume (xb_init_comms),
outside of stop_machine, so we might actually need a mutex there.

Maybe we need a rebind_evtchn_irqsafe function that doesn't take the
mutex and can be called from irq context.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.