[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.5 random freeze question
Hi Stefano, Thank you for your support. You are right - with latest change you've proposed I got a continuous prints during platform hang: (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 Looks line issue needs further deeper debugging. Regards, Andrii On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > Hello Andrii, > we are getting closer :-) > > It would help if you post the output with GIC_DEBUG defined but without > the other change that "fixes" the issue. > > I think the problem is probably due to software irqs. > You are getting too many > > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending > > messages. That means you are loosing virtual SGIs (guest VCPU to guest > VCPU). It would be best to investigate why, especially if you get many > more of the same messages without the MAINTENANCE_IRQ change I > suggested. > > This patch might also help understading the problem more: > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index b7516c0..5eaeca2 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue ) > { > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs); > - if ( i >= nr_lrs ) return; > + if ( i >= nr_lrs ) > + { > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into > d%dv%d\n", > + p->irq, v->domain->domain_id, v->vcpu_id); > + continue; > + } > > spin_lock_irqsave(&gic.lock, flags); > gic_set_lr(i, p, GICH_LR_PENDING); > > > > > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: >> Hi Stefano, >> >> No hangs with this change. >> Complete log is the following: >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) >> DRA752 ES1.0 >> <ethaddr> not set. Validating first E-fuse MAC >> cpsw >> - UART enabled - >> - CPU 00000000 booting - >> - Xen starting in Hyp mode - >> - Zero BSS - >> - Setting up control registers - >> - Turning on paging - >> - Ready - >> (XEN) Checking for initrd in /chosen >> (XEN) RAM: 0000000080000000 - 000000009fffffff >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff >> (XEN) >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000 >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000 >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000 >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000 >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000 >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000 >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000 >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000 >> (XEN) >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000 >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages) >> (XEN) Dom heap: 344064 pages >> (XEN) Domain heap initialised >> (XEN) Looking for UART console serial0 >> Xen 4.5-unstable >> (XEN) Xen version 4.5-unstable (atseglytskyi@) >> (arm-linux-gnueabihf-gcc (crosstool-NG >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 >> 20130328 (prerelease)) debu4 >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2 >> (XEN) 32-bit Execution: >> (XEN) Processor Features: 00001131:00011011 >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle >> (XEN) Extensions: GenericTimer Security >> (XEN) Debug Features: 02010555 >> (XEN) Auxiliary Features: 00000000 >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211 >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000 >> (XEN) Platform: TI DRA7 >> (XEN) /psci method must be smc, but is: "hvc" >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c) >> (XEN) Set AuxCoreBoot0 to 0x20 >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 >> (XEN) Using generic timer at 6144 KHz >> (XEN) GIC initialization: >> (XEN) gic_dist_addr=0000000048211000 >> (XEN) gic_cpu_addr=0000000048212000 >> (XEN) gic_hyp_addr=0000000048214000 >> (XEN) gic_vcpu_addr=0000000048216000 >> (XEN) gic_maintenance_irq=25 >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b). >> (XEN) Using scheduler: SMP Credit Scheduler (credit) >> (XEN) I/O virtualisation disabled >> (XEN) Allocated console ring of 16 KiB. >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 >> (XEN) Bringing up CPU1 >> - CPU 00000001 booting - >> - Xen starting in Hyp mode - >> - Setting up control registers - >> - Turning on paging - >> - Ready - >> (XEN) CPU 1 booted. >> (XEN) Brought up 2 CPUs >> (XEN) *** LOADING DOMAIN 0 *** >> (XEN) Loading kernel from boot module 2 >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0) >> (XEN) Loading zImage from 00000000c0000040 to >> 00000000cfc00000-00000000cff50c48 >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8 >> (XEN) Std. Loglevel: All >> (XEN) Guest Loglevel: All >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch >> input to Xen) >> (XEN) Freed 272kB init memory. >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is >> already pending in LR0 >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is >> already pending in LR0 >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2 >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on >> /ocp/i2c@48072000/camera_ov10635 >> [ 0.437500] ldo3: operation not allowed >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe >> [ 0.468750] ov1063x 1-0030: No deserializer node found >> [ 0.468750] ov1063x 1-0030: No serializer node found >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103! >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30 >> [ 0.578125] ahci ahci.0.auto: can't get clock >> [ 0.898437] ldc_module_init >> [ 1.304687] Missing dual_emac_res_vlan in DT. >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave >> [ 1.312500] Missing dual_emac_res_vlan in DT. >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave >> [ 1.382812] Freeing init memory: 236K >> sh: write error: No such device >> Cannot identify '/dev/camera0': 2, No such file or directory >> Parsing config from /xen/images/DomUAndroid.cfg >> XSM Disabled: seclabel not supported >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> dom1 access to irq 53: Function not implemented >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> dom1 access to irq 71: Function not implemented >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> dom1 access to irq 173: Function not implemented >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> dom1 access to irq 174: Function not implemented >> Turning on vfb in domain 1 >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> still lr_pending >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> still lr_pending >> Parsing config from /xen/images/DomUQNX.cfg >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to >> inject irq=2 into d0v0, when it is still lr_pending >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> still lr_pending >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22 >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader >> found: Invalid kernel >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image >> failed: No such file or directory >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot >> (re-)build domain: -3 >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is >> still lr_pending >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> still lr_pending >> Turning on 'vsnd' in domain '1' (dev_id: '0') >> Turning on vkbd in domain 1 >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is >> still lr_pending >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is >> still lr_pending >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> still lr_pending >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1 >> trying to inject irq=2 into d0v0, when it is still lr_pending >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote: >> > OK got it. Give me a few mins >> > >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only >> >> for non-hardware irqs (desc == NULL) and keep avoiding >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs. >> >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid >> >> other potential bugs introduced later. >> >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: >> >>> What if I try on top of current master branch the following code: >> >>> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c >> >>> index 31fb81a..6764ab7 100644 >> >>> --- a/xen/arch/arm/gic-v2.c >> >>> +++ b/xen/arch/arm/gic-v2.c >> >>> @@ -36,6 +36,8 @@ >> >>> #include <asm/io.h> >> >>> #include <asm/gic.h> >> >>> >> >>> +#define GIC_DEBUG 1 >> >>> + >> >>> /* >> >>> * LR register definitions are GIC v2 specific. >> >>> * Moved these definitions from header file to here >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >>> index bcaded9..c03d6a6 100644 >> >>> --- a/xen/arch/arm/gic.c >> >>> +++ b/xen/arch/arm/gic.c >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); >> >>> >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 << >> >>> gic_hw_ops->info->nr_lrs) - 1)) >> >>> >> >>> -#undef GIC_DEBUG >> >>> +#define GIC_DEBUG 1 >> >>> >> >>> static void gic_update_one_lr(struct vcpu *v, int i); >> >>> >> >>> It is equivalent to what you proposing - my code contains >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will >> >>> be executed: >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function >> >>> >> >>> regards, >> >>> Andrii >> >>> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and >> >>> >> everything works fine >> >>> >> The following 2 patches fixes xen/master for my platform. >> >>> >> >> >>> >> Stefano, could you please take a look to these changes? >> >>> >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@xxxxxxxxxxxxxxx> >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200 >> >>> >> >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag >> >>> >> >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434 >> >>> >> Signed-off-by: Andrii Tseglytskyi >> >>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> >> >>> >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c >> >>> >> index 31fb81a..093ecdb 100644 >> >>> >> --- a/xen/arch/arm/gic-v2.c >> >>> >> +++ b/xen/arch/arm/gic-v2.c >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct >> >>> >> pending_irq *p, >> >>> >> << >> >>> >> GICH_V2_LR_PRIORITY_SHIFT) | >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT)); >> >>> >> >> >>> >> - if ( p->desc != NULL ) >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) >> >>> >> { >> >>> >> - if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; >> >>> >> - else >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT); >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; >> >>> >> + } >> >>> >> + else if ( p->desc != NULL ) >> >>> >> + { >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT); >> >>> >> } >> >>> >> >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4); >> >>> > >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not >> >>> > working correctly on OMAP5. This changes might only be better at >> >>> > "hiding" the real issue. >> >>> > >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding >> >>> > maintenance interrupts doesn't work for software interrupts. >> >>> > The commit that should make them work correctly after the >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121 >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll >> >>> > see that is going to set a software irq as PENDING if it is already >> >>> > ACTIVE. >> >>> > Maybe that doesn't work correctly on OMAP5. >> >>> > >> >>> > Could you try this patch on top of >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us understand >> >>> > if the problem is specifically with software irqs. >> >>> > >> >>> > >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >>> > index b7516c0..d8a17c9 100644 >> >>> > --- a/xen/arch/arm/gic.c >> >>> > +++ b/xen/arch/arm/gic.c >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id); >> >>> > /* Maximum cpu interface per GIC */ >> >>> > #define NR_GIC_CPU_IF 8 >> >>> > >> >>> > -#undef GIC_DEBUG >> >>> > +#define GIC_DEBUG 1 >> >>> > >> >>> > static void gic_update_one_lr(struct vcpu *v, int i); >> >>> > >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct >> >>> > pending_irq *p, >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT); >> >>> > if ( p->desc != NULL ) >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq << >> >>> > GICH_LR_PHYSICAL_SHIFT); >> >>> > + else >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ; >> >>> > >> >>> > GICH[GICH_LR + lr] = lr_val; >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> >> >>> Andrii Tseglytskyi | Embedded Dev >> >>> GlobalLogic >> >>> www.globallogic.com >> >>> >> > >> > >> > >> > -- >> > >> > Andrii Tseglytskyi | Embedded Dev >> > GlobalLogic >> > www.globallogic.com >> >> >> >> -- >> >> Andrii Tseglytskyi | Embedded Dev >> GlobalLogic >> www.globallogic.com >> -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |