[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> Hi Stefano,
>> >>
>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && 
>> >> > > lr_all_full() )
>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> > >      else
>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> > >
>> >> > >  }
>> >> >
>> >> > Yes, exactly
>> >>
>> >> I tried, hang still occurs with this change
>> >
>> > We need to figure out why during the hang you still have all the LRs
>> > busy even if you are getting maintenance interrupts that should cause
>> > them to be cleared.
>> >
>>
>> I see that I have free LRs during maintenance interrupt
>>
>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> (XEN) GICH_LRs (vcpu 0) mask=0
>> (XEN)    HW_LR[0]=9a015856
>> (XEN)    HW_LR[1]=0
>> (XEN)    HW_LR[2]=0
>> (XEN)    HW_LR[3]=0
>> (XEN) Inflight irq=86 lr=0
>> (XEN) Inflight irq=2 lr=255
>> (XEN) Pending irq=2
>>
>> But I see that after I got hang - maintenance interrupts are generated
>> continuously. Platform continues printing the same log till reboot.
>
> Exactly the same log? As in the one above you just pasted?
> That is very very suspicious.

Yes exactly the same log. And looks like it means that LRs are flushed
correctly.

>
> I am thinking that we are not handling GICH_HCR_UIE correctly and
> something we do in Xen, maybe writing to an LR register, might trigger a
> new maintenance interrupt immediately causing an infinite loop.
>

Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.

> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> hypervisor entry.
>

Now trying.

>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 4d2a92d..6ae8dc4 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>      if ( is_idle_vcpu(v) )
>          return;
>
> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -821,12 +823,8 @@ void gic_inject(void)
>
>      gic_restore_pending_irqs(current);
>
> -
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> -    else
> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> -
>  }
>
>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
> sgi)



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.