[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



Hi Stefano,

Thank you for your support.

You are right - with latest change you've proposed I got a continuous
prints during platform hang:

(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0

Looks line issue needs further deeper debugging.

Regards,
Andrii

On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> Hello Andrii,
> we are getting closer :-)
>
> It would help if you post the output with GIC_DEBUG defined but without
> the other change that "fixes" the issue.
>
> I think the problem is probably due to software irqs.
> You are getting too many
>
> gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
>
> messages. That means you are loosing virtual SGIs (guest VCPU to guest
> VCPU). It would be best to investigate why, especially if you get many
> more of the same messages without the MAINTENANCE_IRQ change I
> suggested.
>
> This patch might also help understading the problem more:
>
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index b7516c0..5eaeca2 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>      {
>          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> -        if ( i >= nr_lrs ) return;
> +        if ( i >= nr_lrs )
> +        {
> +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into 
> d%dv%d\n",
> +                    p->irq, v->domain->domain_id, v->vcpu_id);
> +            continue;
> +        }
>
>          spin_lock_irqsave(&gic.lock, flags);
>          gic_set_lr(i, p, GICH_LR_PENDING);
>
>
>
>
> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> No hangs with this change.
>> Complete log is the following:
>>
>> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
>> DRA752 ES1.0
>> <ethaddr> not set. Validating first E-fuse MAC
>> cpsw
>> - UART enabled -
>> - CPU 00000000 booting -
>> - Xen starting in Hyp mode -
>> - Zero BSS -
>> - Setting up control registers -
>> - Turning on paging -
>> - Ready -
>> (XEN) Checking for initrd in /chosen
>> (XEN) RAM: 0000000080000000 - 000000009fffffff
>> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
>> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
>> (XEN)
>> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
>> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
>> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
>> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
>> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
>> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
>> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
>> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
>> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
>> (XEN)
>> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
>> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
>> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
>> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
>> (XEN) Dom heap: 344064 pages
>> (XEN) Domain heap initialised
>> (XEN) Looking for UART console serial0
>>  Xen 4.5-unstable
>> (XEN) Xen version 4.5-unstable (atseglytskyi@)
>> (arm-linux-gnueabihf-gcc (crosstool-NG
>> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
>> 20130328 (prerelease)) debu4
>> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
>> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
>> (XEN) 32-bit Execution:
>> (XEN)   Processor Features: 00001131:00011011
>> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
>> (XEN)     Extensions: GenericTimer Security
>> (XEN)   Debug Features: 02010555
>> (XEN)   Auxiliary Features: 00000000
>> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
>> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
>> (XEN) Platform: TI DRA7
>> (XEN) /psci method must be smc, but is: "hvc"
>> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
>> (XEN) Set AuxCoreBoot0 to 0x20
>> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
>> (XEN) Using generic timer at 6144 KHz
>> (XEN) GIC initialization:
>> (XEN)         gic_dist_addr=0000000048211000
>> (XEN)         gic_cpu_addr=0000000048212000
>> (XEN)         gic_hyp_addr=0000000048214000
>> (XEN)         gic_vcpu_addr=0000000048216000
>> (XEN)         gic_maintenance_irq=25
>> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
>> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> (XEN) I/O virtualisation disabled
>> (XEN) Allocated console ring of 16 KiB.
>> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
>> (XEN) Bringing up CPU1
>> - CPU 00000001 booting -
>> - Xen starting in Hyp mode -
>> - Setting up control registers -
>> - Turning on paging -
>> - Ready -
>> (XEN) CPU 1 booted.
>> (XEN) Brought up 2 CPUs
>> (XEN) *** LOADING DOMAIN 0 ***
>> (XEN) Loading kernel from boot module 2
>> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
>> (XEN) Loading zImage from 00000000c0000040 to 
>> 00000000cfc00000-00000000cff50c48
>> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
>> (XEN) Std. Loglevel: All
>> (XEN) Guest Loglevel: All
>> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
>> input to Xen)
>> (XEN) Freed 272kB init memory.
>> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> already pending in LR0
>> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> already pending in LR0
>> [    0.000000] /cpus/cpu@0 missing clock-frequency property
>> [    0.000000] /cpus/cpu@1 missing clock-frequency property
>> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
>> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
>> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
>> /ocp/i2c@48072000/camera_ov10635
>> [    0.437500] ldo3: operation not allowed
>> [    0.437500] omapdss HDMI error: can't set the voltage regulator
>> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
>> [    0.468750] ov1063x 1-0030: No deserializer node found
>> [    0.468750] ov1063x 1-0030: No serializer node found
>> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
>> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
>> [    0.578125] ahci ahci.0.auto: can't get clock
>> [    0.898437] ldc_module_init
>> [    1.304687] Missing dual_emac_res_vlan in DT.
>> [    1.304687] Using 1 as Reserved VLAN for 0 slave
>> [    1.312500] Missing dual_emac_res_vlan in DT.
>> [    1.320312] Using 2 as Reserved VLAN for 1 slave
>> [    1.382812] Freeing init memory: 236K
>> sh: write error: No such device
>> Cannot identify '/dev/camera0': 2, No such file or directory
>> Parsing config from /xen/images/DomUAndroid.cfg
>> XSM Disabled: seclabel not supported
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 53: Function not implemented
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 71: Function not implemented
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 173: Function not implemented
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 174: Function not implemented
>> Turning on vfb in domain 1
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> Parsing config from /xen/images/DomUQNX.cfg
>> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
>> inject irq=2 into d0v0, when it is still lr_pending
>>
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
>> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
>> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
>> found: Invalid kernel
>> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
>> failed: No such file or directory
>> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
>> (re-)build domain: -3
>> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> Turning on 'vsnd' in domain '1' (dev_id: '0')
>> Turning on vkbd in domain 1
>> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>>
>> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
>> trying to inject irq=2 into d0v0, when it is still lr_pending
>>
>> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
>> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
>> > OK got it. Give me a few mins
>> >
>> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
>> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
>> >> for non-hardware irqs (desc == NULL) and keep avoiding
>> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
>> >>
>> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
>> >> other potential bugs introduced later.
>> >>
>> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> What if I try on top of current master branch the following code:
>> >>>
>> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >>> index 31fb81a..6764ab7 100644
>> >>> --- a/xen/arch/arm/gic-v2.c
>> >>> +++ b/xen/arch/arm/gic-v2.c
>> >>> @@ -36,6 +36,8 @@
>> >>>  #include <asm/io.h>
>> >>>  #include <asm/gic.h>
>> >>>
>> >>> +#define GIC_DEBUG 1
>> >>> +
>> >>>  /*
>> >>>   * LR register definitions are GIC v2 specific.
>> >>>   * Moved these definitions from header file to here
>> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >>> index bcaded9..c03d6a6 100644
>> >>> --- a/xen/arch/arm/gic.c
>> >>> +++ b/xen/arch/arm/gic.c
>> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >>>
>> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
>> >>> gic_hw_ops->info->nr_lrs) - 1))
>> >>>
>> >>> -#undef GIC_DEBUG
>> >>> +#define GIC_DEBUG 1
>> >>>
>> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
>> >>>
>> >>> It is equivalent to what you proposing - my code contains
>> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
>> >>> be executed:
>> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
>> >>>
>> >>> regards,
>> >>> Andrii
>> >>>
>> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
>> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>> >>> >> everything works fine
>> >>> >> The following 2 patches fixes xen/master for my platform.
>> >>> >>
>> >>> >> Stefano, could you please take a look to these changes?
>> >>> >>
>> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
>> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
>> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
>> >>> >>
>> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>> >>> >>
>> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>> >>> >>     Signed-off-by: Andrii Tseglytskyi 
>> >>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
>> >>> >>
>> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >>> >> index 31fb81a..093ecdb 100644
>> >>> >> --- a/xen/arch/arm/gic-v2.c
>> >>> >> +++ b/xen/arch/arm/gic-v2.c
>> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>> >>> >> pending_irq *p,
>> >>> >>                                               << 
>> >>> >> GICH_V2_LR_PRIORITY_SHIFT) |
>> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
>> >>> >>
>> >>> >> -    if ( p->desc != NULL )
>> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >>> >>      {
>> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >>> >> -        else
>> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>> >>> >> GICH_V2_LR_PHYSICAL_MASK )
>> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >>> >> +    }
>> >>> >> +    else if ( p->desc != NULL )
>> >>> >> +    {
>> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & 
>> >>> >> GICH_V2_LR_PHYSICAL_MASK )
>> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>> >>> >>      }
>> >>> >>
>> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
>> >>> >
>> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
>> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
>> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
>> >>> > working correctly on OMAP5. This changes might only be better at
>> >>> > "hiding" the real issue.
>> >>> >
>> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
>> >>> > maintenance interrupts doesn't work for software interrupts.
>> >>> > The commit that should make them work correctly after the
>> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
>> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
>> >>> > see that is going to set a software irq as PENDING if it is already 
>> >>> > ACTIVE.
>> >>> > Maybe that doesn't work correctly on OMAP5.
>> >>> >
>> >>> > Could you try this patch on top of
>> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
>> >>> > if the problem is specifically with software irqs.
>> >>> >
>> >>> >
>> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >>> > index b7516c0..d8a17c9 100644
>> >>> > --- a/xen/arch/arm/gic.c
>> >>> > +++ b/xen/arch/arm/gic.c
>> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>> >>> >  /* Maximum cpu interface per GIC */
>> >>> >  #define NR_GIC_CPU_IF 8
>> >>> >
>> >>> > -#undef GIC_DEBUG
>> >>> > +#define GIC_DEBUG 1
>> >>> >
>> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
>> >>> >
>> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct 
>> >>> > pending_irq *p,
>> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> >>> >      if ( p->desc != NULL )
>> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << 
>> >>> > GICH_LR_PHYSICAL_SHIFT);
>> >>> > +    else
>> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>> >>> >
>> >>> >      GICH[GICH_LR + lr] = lr_val;
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> Andrii Tseglytskyi | Embedded Dev
>> >>> GlobalLogic
>> >>> www.globallogic.com
>> >>>
>> >
>> >
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.