[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > Hi Stefano, > > Thank you for your support. > > You are right - with latest change you've proposed I got a continuous > prints during platform hang: > > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > > Looks line issue needs further deeper debugging. Cool! You could simply print what irqs are in all LRs when they are full, for example you could call gic_dump_info. That would tell us what is taking all the LRs space we have. How many LRs are available on omap5 anyway? I doubt you have so much interrupt traffic to actually fill all the LRs, so I am thinking that a few LRs might not be cleared properly (that should happen on hypervisor entry, gic_update_one_lr should take care of it). > Regards, > Andrii > > On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > > Hello Andrii, > > we are getting closer :-) > > > > It would help if you post the output with GIC_DEBUG defined but without > > the other change that "fixes" the issue. > > > > I think the problem is probably due to software irqs. > > You are getting too many > > > > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending > > > > messages. That means you are loosing virtual SGIs (guest VCPU to guest > > VCPU). It would be best to investigate why, especially if you get many > > more of the same messages without the MAINTENANCE_IRQ change I > > suggested. > > > > This patch might also help understading the problem more: > > > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index b7516c0..5eaeca2 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) > > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue ) > > { > > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs); > > - if ( i >= nr_lrs ) return; > > + if ( i >= nr_lrs ) > > + { > > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into > > d%dv%d\n", > > + p->irq, v->domain->domain_id, v->vcpu_id); > > + continue; > > + } > > > > spin_lock_irqsave(&gic.lock, flags); > > gic_set_lr(i, p, GICH_LR_PENDING); > > > > > > > > > > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > >> Hi Stefano, > >> > >> No hangs with this change. > >> Complete log is the following: > >> > >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) > >> DRA752 ES1.0 > >> <ethaddr> not set. Validating first E-fuse MAC > >> cpsw > >> - UART enabled - > >> - CPU 00000000 booting - > >> - Xen starting in Hyp mode - > >> - Zero BSS - > >> - Setting up control registers - > >> - Turning on paging - > >> - Ready - > >> (XEN) Checking for initrd in /chosen > >> (XEN) RAM: 0000000080000000 - 000000009fffffff > >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff > >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff > >> (XEN) > >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa > >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000 > >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000 > >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000 > >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000 > >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000 > >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000 > >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000 > >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000 > >> (XEN) > >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 > >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 > >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000 > >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages) > >> (XEN) Dom heap: 344064 pages > >> (XEN) Domain heap initialised > >> (XEN) Looking for UART console serial0 > >> Xen 4.5-unstable > >> (XEN) Xen version 4.5-unstable (atseglytskyi@) > >> (arm-linux-gnueabihf-gcc (crosstool-NG > >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 > >> 20130328 (prerelease)) debu4 > >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty > >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2 > >> (XEN) 32-bit Execution: > >> (XEN) Processor Features: 00001131:00011011 > >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle > >> (XEN) Extensions: GenericTimer Security > >> (XEN) Debug Features: 02010555 > >> (XEN) Auxiliary Features: 00000000 > >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211 > >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000 > >> (XEN) Platform: TI DRA7 > >> (XEN) /psci method must be smc, but is: "hvc" > >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c) > >> (XEN) Set AuxCoreBoot0 to 0x20 > >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 > >> (XEN) Using generic timer at 6144 KHz > >> (XEN) GIC initialization: > >> (XEN) gic_dist_addr=0000000048211000 > >> (XEN) gic_cpu_addr=0000000048212000 > >> (XEN) gic_hyp_addr=0000000048214000 > >> (XEN) gic_vcpu_addr=0000000048216000 > >> (XEN) gic_maintenance_irq=25 > >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b). > >> (XEN) Using scheduler: SMP Credit Scheduler (credit) > >> (XEN) I/O virtualisation disabled > >> (XEN) Allocated console ring of 16 KiB. > >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 > >> (XEN) Bringing up CPU1 > >> - CPU 00000001 booting - > >> - Xen starting in Hyp mode - > >> - Setting up control registers - > >> - Turning on paging - > >> - Ready - > >> (XEN) CPU 1 booted. > >> (XEN) Brought up 2 CPUs > >> (XEN) *** LOADING DOMAIN 0 *** > >> (XEN) Loading kernel from boot module 2 > >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0) > >> (XEN) Loading zImage from 00000000c0000040 to > >> 00000000cfc00000-00000000cff50c48 > >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8 > >> (XEN) Std. Loglevel: All > >> (XEN) Guest Loglevel: All > >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch > >> input to Xen) > >> (XEN) Freed 272kB init memory. > >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > >> already pending in LR0 > >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > >> already pending in LR0 > >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property > >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property > >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory > >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2 > >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on > >> /ocp/i2c@48072000/camera_ov10635 > >> [ 0.437500] ldo3: operation not allowed > >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator > >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe > >> [ 0.468750] ov1063x 1-0030: No deserializer node found > >> [ 0.468750] ov1063x 1-0030: No serializer node found > >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103! > >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30 > >> [ 0.578125] ahci ahci.0.auto: can't get clock > >> [ 0.898437] ldc_module_init > >> [ 1.304687] Missing dual_emac_res_vlan in DT. > >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave > >> [ 1.312500] Missing dual_emac_res_vlan in DT. > >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave > >> [ 1.382812] Freeing init memory: 236K > >> sh: write error: No such device > >> Cannot identify '/dev/camera0': 2, No such file or directory > >> Parsing config from /xen/images/DomUAndroid.cfg > >> XSM Disabled: seclabel not supported > >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> dom1 access to irq 53: Function not implemented > >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> dom1 access to irq 71: Function not implemented > >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> dom1 access to irq 173: Function not implemented > >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> dom1 access to irq 174: Function not implemented > >> Turning on vfb in domain 1 > >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> still lr_pending > >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> still lr_pending > >> Parsing config from /xen/images/DomUQNX.cfg > >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to > >> inject irq=2 into d0v0, when it is still lr_pending > >> > >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> still lr_pending > >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid > >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link > >> -22 > >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader > >> found: Invalid kernel > >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image > >> failed: No such file or directory > >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot > >> (re-)build domain: -3 > >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > >> still lr_pending > >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> still lr_pending > >> Turning on 'vsnd' in domain '1' (dev_id: '0') > >> Turning on vkbd in domain 1 > >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > >> still lr_pending > >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > >> still lr_pending > >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> still lr_pending > >> > >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1 > >> trying to inject irq=2 into d0v0, when it is still lr_pending > >> > >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi > >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote: > >> > OK got it. Give me a few mins > >> > > >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini > >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only > >> >> for non-hardware irqs (desc == NULL) and keep avoiding > >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs. > >> >> > >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid > >> >> other potential bugs introduced later. > >> >> > >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > >> >>> What if I try on top of current master branch the following code: > >> >>> > >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > >> >>> index 31fb81a..6764ab7 100644 > >> >>> --- a/xen/arch/arm/gic-v2.c > >> >>> +++ b/xen/arch/arm/gic-v2.c > >> >>> @@ -36,6 +36,8 @@ > >> >>> #include <asm/io.h> > >> >>> #include <asm/gic.h> > >> >>> > >> >>> +#define GIC_DEBUG 1 > >> >>> + > >> >>> /* > >> >>> * LR register definitions are GIC v2 specific. > >> >>> * Moved these definitions from header file to here > >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >>> index bcaded9..c03d6a6 100644 > >> >>> --- a/xen/arch/arm/gic.c > >> >>> +++ b/xen/arch/arm/gic.c > >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > >> >>> > >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 << > >> >>> gic_hw_ops->info->nr_lrs) - 1)) > >> >>> > >> >>> -#undef GIC_DEBUG > >> >>> +#define GIC_DEBUG 1 > >> >>> > >> >>> static void gic_update_one_lr(struct vcpu *v, int i); > >> >>> > >> >>> It is equivalent to what you proposing - my code contains > >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will > >> >>> be executed: > >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() > >> >>> function > >> >>> > >> >>> regards, > >> >>> Andrii > >> >>> > >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini > >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and > >> >>> >> everything works fine > >> >>> >> The following 2 patches fixes xen/master for my platform. > >> >>> >> > >> >>> >> Stefano, could you please take a look to these changes? > >> >>> >> > >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca > >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@xxxxxxxxxxxxxxx> > >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200 > >> >>> >> > >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag > >> >>> >> > >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434 > >> >>> >> Signed-off-by: Andrii Tseglytskyi > >> >>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> > >> >>> >> > >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > >> >>> >> index 31fb81a..093ecdb 100644 > >> >>> >> --- a/xen/arch/arm/gic-v2.c > >> >>> >> +++ b/xen/arch/arm/gic-v2.c > >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const > >> >>> >> struct > >> >>> >> pending_irq *p, > >> >>> >> << > >> >>> >> GICH_V2_LR_PRIORITY_SHIFT) | > >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << > >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT)); > >> >>> >> > >> >>> >> - if ( p->desc != NULL ) > >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > >> >>> >> { > >> >>> >> - if ( > >> >>> >> platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > >> >>> >> - else > >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & > >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) > >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT); > >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > >> >>> >> + } > >> >>> >> + else if ( p->desc != NULL ) > >> >>> >> + { > >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & > >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) > >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT); > >> >>> >> } > >> >>> >> > >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4); > >> >>> > > >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it > >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need > >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW > >> >>> > not > >> >>> > working correctly on OMAP5. This changes might only be better at > >> >>> > "hiding" the real issue. > >> >>> > > >> >>> > Maybe the problem is exactly the opposite: the new scheme for > >> >>> > avoiding > >> >>> > maintenance interrupts doesn't work for software interrupts. > >> >>> > The commit that should make them work correctly after the > >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121 > >> >>> > If you look at the changes to gic_update_one_lr in that commit, > >> >>> > you'll > >> >>> > see that is going to set a software irq as PENDING if it is already > >> >>> > ACTIVE. > >> >>> > Maybe that doesn't work correctly on OMAP5. > >> >>> > > >> >>> > Could you try this patch on top of > >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us > >> >>> > understand > >> >>> > if the problem is specifically with software irqs. > >> >>> > > >> >>> > > >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >>> > index b7516c0..d8a17c9 100644 > >> >>> > --- a/xen/arch/arm/gic.c > >> >>> > +++ b/xen/arch/arm/gic.c > >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id); > >> >>> > /* Maximum cpu interface per GIC */ > >> >>> > #define NR_GIC_CPU_IF 8 > >> >>> > > >> >>> > -#undef GIC_DEBUG > >> >>> > +#define GIC_DEBUG 1 > >> >>> > > >> >>> > static void gic_update_one_lr(struct vcpu *v, int i); > >> >>> > > >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct > >> >>> > pending_irq *p, > >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT); > >> >>> > if ( p->desc != NULL ) > >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq << > >> >>> > GICH_LR_PHYSICAL_SHIFT); > >> >>> > + else > >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ; > >> >>> > > >> >>> > GICH[GICH_LR + lr] = lr_val; > >> >>> > > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> > >> >>> Andrii Tseglytskyi | Embedded Dev > >> >>> GlobalLogic > >> >>> www.globallogic.com > >> >>> > >> > > >> > > >> > > >> > -- > >> > > >> > Andrii Tseglytskyi | Embedded Dev > >> > GlobalLogic > >> > www.globallogic.com > >> > >> > >> > >> -- > >> > >> Andrii Tseglytskyi | Embedded Dev > >> GlobalLogic > >> www.globallogic.com > >> > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |