[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug report and patch about IRQ freezing after gic_restore_state

To: Julien Grall <julien.grall@xxxxxxxxxx>
From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Date: Tue, 21 May 2013 13:00:58 +0100
Cc: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, "jaeyong.yoo@xxxxxxxxxxx" <jaeyong.yoo@xxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Tue, 21 May 2013 12:01:26 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 20 May 2013, Julien Grall wrote:
> On 05/20/2013 01:41 AM, Jaeyong Yoo wrote:
> 
> Hello,
> 
> > I'm running xen on Arndale board and if I run both iperf and du command at 
> > Dom0, 
> > one of IRQ (either SATA or network) suddenly stop occuring anymore. 
> > After some investigation, I found out that when context switching at Xen, 
> > IRQs in LR (about to be delivered to Doms) could be lost and never occur 
> > anymore. 
> > Here goes function call sequence that this problem occurs: 
> > (in context switching)
> >   - schedule_tail 
> >       - ctxt_switch_from 
> >       - local_irq_enable 
> >       - // after this part, some IRQ can occur and could be directly 
> > written to LR 
> >       - ctxt_switch_to 
> >           - ... (some more functions) 
> >           - // before the above IRQ is delivered to Dom (and maintenance 
> > IRQ not called),
> >             // gic_restore_state can be called 
> >           - gic_restore_state /* when restoring gic state, the above IRQ 
> >                                        * (written to LR) is overwritten 
> >                                        * to the previous values, and 
> > somehow, 
> >                                        * the corresponding IRQ never occur 
> > again */ 
> > 
> > I made the following patch (i.e., enable local irq after gic_restore_state) 
> > for preventing the above problem. 
> 
> Thanks for the patch, I was looking with a similar error on the Arndale
> Board for a couple of day.

Indeed, thanks for the analysis of the bug and the patch!

It is a particularly difficult bug to track down because it can only
happen if an irq arrives after ctxt_switch_from and before
ctxt_switch_to, and the irq is for the next vcpu to be scheduled on the
pcpu (otherwise the v == current check at the beginning of
gic_set_guest_irq would catch that).
Rather than extending the check in gic_set_guest_irq, I think it is wise
to run ctxt_switch_to with interrupts disabled.


> > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx> 
> > --- 
> >  xen/arch/arm/domain.c |    4 ++-- 
> >  xen/arch/arm/gic.c    |    4 ++-- 
> >  2 files changed, 4 insertions(+), 4 deletions(-) 
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c 
> > index f71b582..2c3b132 100644 
> > --- a/xen/arch/arm/domain.c 
> > +++ b/xen/arch/arm/domain.c 
> > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) 
> >      /* VGIC */ 
> >      gic_restore_state(n); 
> > +    local_irq_enable(); 
> > +
> 
> Could you move the local_irq_enable right after ctxt_switch_to?

Right, good idea.


> >      /* XXX VFP */ 
> >      /* XXX MPU */ 
> > @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev) 
> >  { 
> >      ctxt_switch_from(prev); 
> > -    local_irq_enable(); 
> > - 
> >      /* TODO 
> >         update_runstate_area(current); 
> >      */ 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c 
> > index d4f0a43..8186ad8 100644 
> > --- a/xen/arch/arm/gic.c 
> > +++ b/xen/arch/arm/gic.c 
> > @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v) 
> >      if ( is_idle_vcpu(v) ) 
> >          return; 
> > -    spin_lock_irq(&gic.lock); 
> > +    spin_lock(&gic.lock); 
> >      this_cpu(lr_mask) = v->arch.lr_mask; 
> >      for ( i=0; i<nr_lrs; i++) 
> >          GICH[GICH_LR + i] = v->arch.gic_lr[i]; 
> > -    spin_unlock_irq(&gic.lock); 
> > +    spin_unlock(&gic.lock); 
> 
> As the IRQ is disabled and the GICH registers can only be modified by
> the current physical CPU, I think you can remove the spin_{,un}lock and
> replace it by a dsb.

Yes, we can remove the spin_lock but I don't think we need a dsb
there. See the presence of an isb() two lines below.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] Bug report and patch about IRQ freezing after gic_restore_state
  - From: Jaeyong Yoo
- Re: [Xen-devel] Bug report and patch about IRQ freezing after gic_restore_state
  - From: Julien Grall

Prev by Date: Re: [Xen-devel] CAP and performance problem
Next by Date: Re: [Xen-devel] EL6 initscript feedback.
Previous by thread: Re: [Xen-devel] Bug report and patch about IRQ freezing after gic_restore_state
Next by thread: [Xen-devel] [xen-unstable test] 17986: tolerable FAIL
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.