|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
>
> Thank you for your support.
>
> You are right - with latest change you've proposed I got a continuous
> prints during platform hang:
>
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>
> Looks line issue needs further deeper debugging.
Cool! You could simply print what irqs are in all LRs when they are
full, for example you could call gic_dump_info. That would tell us what
is taking all the LRs space we have.
How many LRs are available on omap5 anyway?
I doubt you have so much interrupt traffic to actually fill all the LRs,
so I am thinking that a few LRs might not be cleared properly (that
should happen on hypervisor entry, gic_update_one_lr should take care of
it).
> Regards,
> Andrii
>
> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > Hello Andrii,
> > we are getting closer :-)
> >
> > It would help if you post the output with GIC_DEBUG defined but without
> > the other change that "fixes" the issue.
> >
> > I think the problem is probably due to software irqs.
> > You are getting too many
> >
> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> >
> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> > VCPU). It would be best to investigate why, especially if you get many
> > more of the same messages without the MAINTENANCE_IRQ change I
> > suggested.
> >
> > This patch might also help understading the problem more:
> >
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..5eaeca2 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> > {
> > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > - if ( i >= nr_lrs ) return;
> > + if ( i >= nr_lrs )
> > + {
> > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into
> > d%dv%d\n",
> > + p->irq, v->domain->domain_id, v->vcpu_id);
> > + continue;
> > + }
> >
> > spin_lock_irqsave(&gic.lock, flags);
> > gic_set_lr(i, p, GICH_LR_PENDING);
> >
> >
> >
> >
> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi Stefano,
> >>
> >> No hangs with this change.
> >> Complete log is the following:
> >>
> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> >> DRA752 ES1.0
> >> <ethaddr> not set. Validating first E-fuse MAC
> >> cpsw
> >> - UART enabled -
> >> - CPU 00000000 booting -
> >> - Xen starting in Hyp mode -
> >> - Zero BSS -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) Checking for initrd in /chosen
> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> >> (XEN)
> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000
> >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000
> >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000
> >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000
> >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000
> >> (XEN)
> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> >> (XEN) Dom heap: 344064 pages
> >> (XEN) Domain heap initialised
> >> (XEN) Looking for UART console serial0
> >> Xen 4.5-unstable
> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> >> 20130328 (prerelease)) debu4
> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> >> (XEN) 32-bit Execution:
> >> (XEN) Processor Features: 00001131:00011011
> >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> >> (XEN) Extensions: GenericTimer Security
> >> (XEN) Debug Features: 02010555
> >> (XEN) Auxiliary Features: 00000000
> >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211
> >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> >> (XEN) Platform: TI DRA7
> >> (XEN) /psci method must be smc, but is: "hvc"
> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> >> (XEN) Set AuxCoreBoot0 to 0x20
> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> >> (XEN) Using generic timer at 6144 KHz
> >> (XEN) GIC initialization:
> >> (XEN) gic_dist_addr=0000000048211000
> >> (XEN) gic_cpu_addr=0000000048212000
> >> (XEN) gic_hyp_addr=0000000048214000
> >> (XEN) gic_vcpu_addr=0000000048216000
> >> (XEN) gic_maintenance_irq=25
> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> >> (XEN) I/O virtualisation disabled
> >> (XEN) Allocated console ring of 16 KiB.
> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> >> (XEN) Bringing up CPU1
> >> - CPU 00000001 booting -
> >> - Xen starting in Hyp mode -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) CPU 1 booted.
> >> (XEN) Brought up 2 CPUs
> >> (XEN) *** LOADING DOMAIN 0 ***
> >> (XEN) Loading kernel from boot module 2
> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> >> (XEN) Loading zImage from 00000000c0000040 to
> >> 00000000cfc00000-00000000cff50c48
> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> >> (XEN) Std. Loglevel: All
> >> (XEN) Guest Loglevel: All
> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> >> input to Xen)
> >> (XEN) Freed 272kB init memory.
> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> already pending in LR0
> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> already pending in LR0
> >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property
> >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property
> >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on
> >> /ocp/i2c@48072000/camera_ov10635
> >> [ 0.437500] ldo3: operation not allowed
> >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator
> >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> >> [ 0.468750] ov1063x 1-0030: No deserializer node found
> >> [ 0.468750] ov1063x 1-0030: No serializer node found
> >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> >> [ 0.578125] ahci ahci.0.auto: can't get clock
> >> [ 0.898437] ldc_module_init
> >> [ 1.304687] Missing dual_emac_res_vlan in DT.
> >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave
> >> [ 1.312500] Missing dual_emac_res_vlan in DT.
> >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave
> >> [ 1.382812] Freeing init memory: 236K
> >> sh: write error: No such device
> >> Cannot identify '/dev/camera0': 2, No such file or directory
> >> Parsing config from /xen/images/DomUAndroid.cfg
> >> XSM Disabled: seclabel not supported
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 53: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 71: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 173: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 174: Function not implemented
> >> Turning on vfb in domain 1
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> Parsing config from /xen/images/DomUQNX.cfg
> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> >> inject irq=2 into d0v0, when it is still lr_pending
> >>
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link
> >> -22
> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> >> found: Invalid kernel
> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> >> failed: No such file or directory
> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> >> (re-)build domain: -3
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> >> Turning on vkbd in domain 1
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >>
> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> >>
> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> >> > OK got it. Give me a few mins
> >> >
> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >> >>
> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> >> other potential bugs introduced later.
> >> >>
> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> What if I try on top of current master branch the following code:
> >> >>>
> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >>> index 31fb81a..6764ab7 100644
> >> >>> --- a/xen/arch/arm/gic-v2.c
> >> >>> +++ b/xen/arch/arm/gic-v2.c
> >> >>> @@ -36,6 +36,8 @@
> >> >>> #include <asm/io.h>
> >> >>> #include <asm/gic.h>
> >> >>>
> >> >>> +#define GIC_DEBUG 1
> >> >>> +
> >> >>> /*
> >> >>> * LR register definitions are GIC v2 specific.
> >> >>> * Moved these definitions from header file to here
> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >>> index bcaded9..c03d6a6 100644
> >> >>> --- a/xen/arch/arm/gic.c
> >> >>> +++ b/xen/arch/arm/gic.c
> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >> >>>
> >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> >> >>>
> >> >>> -#undef GIC_DEBUG
> >> >>> +#define GIC_DEBUG 1
> >> >>>
> >> >>> static void gic_update_one_lr(struct vcpu *v, int i);
> >> >>>
> >> >>> It is equivalent to what you proposing - my code contains
> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >> >>> be executed:
> >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr()
> >> >>> function
> >> >>>
> >> >>> regards,
> >> >>> Andrii
> >> >>>
> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >> >>> >> everything works fine
> >> >>> >> The following 2 patches fixes xen/master for my platform.
> >> >>> >>
> >> >>> >> Stefano, could you please take a look to these changes?
> >> >>> >>
> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
> >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200
> >> >>> >>
> >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >> >>> >>
> >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >> >>> >> Signed-off-by: Andrii Tseglytskyi
> >> >>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
> >> >>> >>
> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >>> >> index 31fb81a..093ecdb 100644
> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const
> >> >>> >> struct
> >> >>> >> pending_irq *p,
> >> >>> >> <<
> >> >>> >> GICH_V2_LR_PRIORITY_SHIFT) |
> >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >> >>> >>
> >> >>> >> - if ( p->desc != NULL )
> >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >>> >> {
> >> >>> >> - if (
> >> >>> >> platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >>> >> - else
> >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >>> >> + }
> >> >>> >> + else if ( p->desc != NULL )
> >> >>> >> + {
> >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >>> >> }
> >> >>> >>
> >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4);
> >> >>> >
> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW
> >> >>> > not
> >> >>> > working correctly on OMAP5. This changes might only be better at
> >> >>> > "hiding" the real issue.
> >> >>> >
> >> >>> > Maybe the problem is exactly the opposite: the new scheme for
> >> >>> > avoiding
> >> >>> > maintenance interrupts doesn't work for software interrupts.
> >> >>> > The commit that should make them work correctly after the
> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >> >>> > If you look at the changes to gic_update_one_lr in that commit,
> >> >>> > you'll
> >> >>> > see that is going to set a software irq as PENDING if it is already
> >> >>> > ACTIVE.
> >> >>> > Maybe that doesn't work correctly on OMAP5.
> >> >>> >
> >> >>> > Could you try this patch on top of
> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us
> >> >>> > understand
> >> >>> > if the problem is specifically with software irqs.
> >> >>> >
> >> >>> >
> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >>> > index b7516c0..d8a17c9 100644
> >> >>> > --- a/xen/arch/arm/gic.c
> >> >>> > +++ b/xen/arch/arm/gic.c
> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >> >>> > /* Maximum cpu interface per GIC */
> >> >>> > #define NR_GIC_CPU_IF 8
> >> >>> >
> >> >>> > -#undef GIC_DEBUG
> >> >>> > +#define GIC_DEBUG 1
> >> >>> >
> >> >>> > static void gic_update_one_lr(struct vcpu *v, int i);
> >> >>> >
> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct
> >> >>> > pending_irq *p,
> >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >> >>> > if ( p->desc != NULL )
> >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq <<
> >> >>> > GICH_LR_PHYSICAL_SHIFT);
> >> >>> > + else
> >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >> >>> >
> >> >>> > GICH[GICH_LR + lr] = lr_val;
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>> Andrii Tseglytskyi | Embedded Dev
> >> >>> GlobalLogic
> >> >>> www.globallogic.com
> >> >>>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Andrii Tseglytskyi | Embedded Dev
> >> > GlobalLogic
> >> > www.globallogic.com
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >>
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |