|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.5 random freeze question
Hi Stefano,
On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> I think that's OK: it looks like that on your board for some reasons
> when UIE is set you get irq 1023 (spurious interrupt) instead of your
> normal maintenance interrupt.
OK, but I think this should be investigated too. What do you think ?
>
> But everything should work anyway without issues.
>
> This is the same patch as before but on top of the lastest xen-unstable
> tree. Please confirm if it works.
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 70d10d6..df140b9 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> if ( is_idle_vcpu(v) )
> return;
>
> + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> +
> spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -527,8 +529,6 @@ void gic_inject(void)
>
> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> - else
> - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> }
>
I confirm - it works fine. Will this be a final fix ?
Regards,
Andrii
> static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> I got this strange log:
>>
>> (XEN) received maintenance interrupt irq=1023
>>
>> And platform does not hang due to this:
>> + hcr = GICH[GICH_HCR];
>> + if ( hcr & GICH_HCR_UIE )
>> + {
>> + GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> + uie_on = 1;
>> + }
>>
>> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
>> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> Hi Stefano,
>> >> >> >>>
>> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> >> Hi Stefano,
>> >> >> >>> >>
>> >> >> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) &&
>> >> >> >>> >> > > lr_all_full() )
>> >> >> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >> >>> >> > > else
>> >> >> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >> >>> >> > >
>> >> >> >>> >> > > }
>> >> >> >>> >> >
>> >> >> >>> >> > Yes, exactly
>> >> >> >>> >>
>> >> >> >>> >> I tried, hang still occurs with this change
>> >> >> >>> >
>> >> >> >>> > We need to figure out why during the hang you still have all the
>> >> >> >>> > LRs
>> >> >> >>> > busy even if you are getting maintenance interrupts that should
>> >> >> >>> > cause
>> >> >> >>> > them to be cleared.
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >> >>>
>> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >> >>> (XEN) HW_LR[0]=9a015856
>> >> >> >>> (XEN) HW_LR[1]=0
>> >> >> >>> (XEN) HW_LR[2]=0
>> >> >> >>> (XEN) HW_LR[3]=0
>> >> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >> >>> (XEN) Pending irq=2
>> >> >> >>>
>> >> >> >>> But I see that after I got hang - maintenance interrupts are
>> >> >> >>> generated
>> >> >> >>> continuously. Platform continues printing the same log till reboot.
>> >> >> >>
>> >> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> >> That is very very suspicious.
>> >> >> >
>> >> >> > Yes exactly the same log. And looks like it means that LRs are
>> >> >> > flushed
>> >> >> > correctly.
>> >> >> >
>> >> >> >>
>> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> >> >> something we do in Xen, maybe writing to an LR register, might
>> >> >> >> trigger a
>> >> >> >> new maintenance interrupt immediately causing an infinite loop.
>> >> >> >>
>> >> >> >
>> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
>> >> >> > debug info it looks like once LRs are overloaded with SGIs -
>> >> >> > maintenance interrupt occurs.
>> >> >> > And then it is not handled properly, and occurs again and again - so
>> >> >> > platform hangs inside its handler.
>> >> >> >
>> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE
>> >> >> >> immediately on
>> >> >> >> hypervisor entry.
>> >> >> >>
>> >> >> >
>> >> >> > Now trying.
>> >> >> >
>> >> >> >>
>> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> >> index 4d2a92d..6ae8dc4 100644
>> >> >> >> --- a/xen/arch/arm/gic.c
>> >> >> >> +++ b/xen/arch/arm/gic.c
>> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >> >> if ( is_idle_vcpu(v) )
>> >> >> >> return;
>> >> >> >>
>> >> >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> +
>> >> >> >> spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >> >> >>
>> >> >> >> while ((i = find_next_bit((const unsigned long *)
>> >> >> >> &this_cpu(lr_mask),
>> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >> >> >>
>> >> >> >> gic_restore_pending_irqs(current);
>> >> >> >>
>> >> >> >> -
>> >> >> >> if ( !list_empty(¤t->arch.vgic.lr_pending) &&
>> >> >> >> lr_all_full() )
>> >> >> >> GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >> - else
>> >> >> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> -
>> >> >> >> }
>> >> >> >>
>> >> >> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum
>> >> >> >> gic_sgi sgi)
>> >> >> >
>> >> >>
>> >> >> Heh - I don't see hangs with this patch :) But also I see that
>> >> >> maintenance interrupt doesn't occur (and no hang as result)
>> >> >> Stefano - is this expected?
>> >> >
>> >> > No maintenance interrupts at all? That's strange. You should be
>> >> > receiving them when LRs are full and you still have interrupts pending
>> >> > to be added to them.
>> >> >
>> >> > You could add another printk here to see if you should be receiving
>> >> > them:
>> >> >
>> >> > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>> >> > + {
>> >> > + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>> >> > GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> > - else
>> >> > - GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> > -
>> >> > + }
>> >> > }
>> >> >
>> >>
>> >> Requested properly:
>> >>
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >>
>> >> But does not occur
>> >
>> > OK, let's see what's going on then by printing the irq number of the
>> > maintenance interrupt:
>> >
>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> > index 4d2a92d..fed3167 100644
>> > --- a/xen/arch/arm/gic.c
>> > +++ b/xen/arch/arm/gic.c
>> > @@ -55,6 +55,7 @@ static struct {
>> > static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >
>> > static uint8_t nr_lrs;
>> > +static bool uie_on;
>> > #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>> >
>> > /* The GIC mapping of CPU interfaces does not necessarily match the
>> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>> > {
>> > int i = 0;
>> > unsigned long flags;
>> > + unsigned long hcr;
>> >
>> > /* The idle domain has no LRs to be cleared. Since gic_restore_state
>> > * doesn't write any LR registers for the idle domain they could be
>> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>> > if ( is_idle_vcpu(v) )
>> > return;
>> >
>> > + hcr = GICH[GICH_HCR];
>> > + if ( hcr & GICH_HCR_UIE )
>> > + {
>> > + GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> > + uie_on = 1;
>> > + }
>> > +
>> > spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >
>> > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int
>> > is_fiq)
>> > intack = GICC[GICC_IAR];
>> > irq = intack & GICC_IA_IRQ;
>> >
>> > + if ( uie_on )
>> > + {
>> > + uie_on = 0;
>> > + printk("received maintenance interrupt irq=%d\n", irq);
>> > + }
>> > if ( likely(irq >= 16 && irq < 1021) )
>> > {
>> > local_irq_enable();
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>
--
Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |