[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> Hi Stefano, >> >> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> > I think that's OK: it looks like that on your board for some reasons >> > when UIE is set you get irq 1023 (spurious interrupt) instead of your >> > normal maintenance interrupt. >> >> OK, but I think this should be investigated too. What do you think ? > > I think it is harmless: my guess is that if we clear UIE before reading > GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance > interrupt. But it doesn't really matter to us. OK. I think catching this will be a good exercise for someone )) But out of scope for this issue. > >> > >> > But everything should work anyway without issues. >> > >> > This is the same patch as before but on top of the lastest xen-unstable >> > tree. Please confirm if it works. >> > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> > index 70d10d6..df140b9 100644 >> > --- a/xen/arch/arm/gic.c >> > +++ b/xen/arch/arm/gic.c >> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) >> > if ( is_idle_vcpu(v) ) >> > return; >> > >> > + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); >> > + >> > spin_lock_irqsave(&v->arch.vgic.lock, flags); >> > >> > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), >> > @@ -527,8 +529,6 @@ void gic_inject(void) >> > >> > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >> > gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); >> > - else >> > - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); >> > } >> > >> >> I confirm - it works fine. Will this be a final fix ? > > Yep :-) > Many thanks for your help on this! Thank you Stefano. This issue was really critical for us :) Regards, Andrii > > >> Regards, >> Andrii >> >> > static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) >> > >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> I got this strange log: >> >> >> >> (XEN) received maintenance interrupt irq=1023 >> >> >> >> And platform does not hang due to this: >> >> + hcr = GICH[GICH_HCR]; >> >> + if ( hcr & GICH_HCR_UIE ) >> >> + { >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> + uie_on = 1; >> >> + } >> >> >> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini >> >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi >> >> >> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote: >> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini >> >> >> >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> >> >>> Hi Stefano, >> >> >> >> >>> >> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini >> >> >> >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote: >> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> >> >>> >> Hi Stefano, >> >> >> >> >>> >> >> >> >> >> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && >> >> >> >> >>> >> > > lr_all_full() ) >> >> >> >> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; >> >> >> >> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; >> >> >> >> >>> >> > > else >> >> >> >> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> >> >> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; >> >> >> >> >>> >> > > >> >> >> >> >>> >> > > } >> >> >> >> >>> >> > >> >> >> >> >>> >> > Yes, exactly >> >> >> >> >>> >> >> >> >> >> >>> >> I tried, hang still occurs with this change >> >> >> >> >>> > >> >> >> >> >>> > We need to figure out why during the hang you still have all >> >> >> >> >>> > the LRs >> >> >> >> >>> > busy even if you are getting maintenance interrupts that >> >> >> >> >>> > should cause >> >> >> >> >>> > them to be cleared. >> >> >> >> >>> > >> >> >> >> >>> >> >> >> >> >>> I see that I have free LRs during maintenance interrupt >> >> >> >> >>> >> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt >> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0 >> >> >> >> >>> (XEN) HW_LR[0]=9a015856 >> >> >> >> >>> (XEN) HW_LR[1]=0 >> >> >> >> >>> (XEN) HW_LR[2]=0 >> >> >> >> >>> (XEN) HW_LR[3]=0 >> >> >> >> >>> (XEN) Inflight irq=86 lr=0 >> >> >> >> >>> (XEN) Inflight irq=2 lr=255 >> >> >> >> >>> (XEN) Pending irq=2 >> >> >> >> >>> >> >> >> >> >>> But I see that after I got hang - maintenance interrupts are >> >> >> >> >>> generated >> >> >> >> >>> continuously. Platform continues printing the same log till >> >> >> >> >>> reboot. >> >> >> >> >> >> >> >> >> >> Exactly the same log? As in the one above you just pasted? >> >> >> >> >> That is very very suspicious. >> >> >> >> > >> >> >> >> > Yes exactly the same log. And looks like it means that LRs are >> >> >> >> > flushed >> >> >> >> > correctly. >> >> >> >> > >> >> >> >> >> >> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and >> >> >> >> >> something we do in Xen, maybe writing to an LR register, might >> >> >> >> >> trigger a >> >> >> >> >> new maintenance interrupt immediately causing an infinite loop. >> >> >> >> >> >> >> >> >> > >> >> >> >> > Yes, this is what I'm thinking about. Taking in account all >> >> >> >> > collected >> >> >> >> > debug info it looks like once LRs are overloaded with SGIs - >> >> >> >> > maintenance interrupt occurs. >> >> >> >> > And then it is not handled properly, and occurs again and again - >> >> >> >> > so >> >> >> >> > platform hangs inside its handler. >> >> >> >> > >> >> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE >> >> >> >> >> immediately on >> >> >> >> >> hypervisor entry. >> >> >> >> >> >> >> >> >> > >> >> >> >> > Now trying. >> >> >> >> > >> >> >> >> >> >> >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >> >> >> >> index 4d2a92d..6ae8dc4 100644 >> >> >> >> >> --- a/xen/arch/arm/gic.c >> >> >> >> >> +++ b/xen/arch/arm/gic.c >> >> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) >> >> >> >> >> if ( is_idle_vcpu(v) ) >> >> >> >> >> return; >> >> >> >> >> >> >> >> >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> >> >> >> + >> >> >> >> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); >> >> >> >> >> >> >> >> >> >> while ((i = find_next_bit((const unsigned long *) >> >> >> >> >> &this_cpu(lr_mask), >> >> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void) >> >> >> >> >> >> >> >> >> >> gic_restore_pending_irqs(current); >> >> >> >> >> >> >> >> >> >> - >> >> >> >> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && >> >> >> >> >> lr_all_full() ) >> >> >> >> >> GICH[GICH_HCR] |= GICH_HCR_UIE; >> >> >> >> >> - else >> >> >> >> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> >> >> >> - >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, >> >> >> >> >> enum gic_sgi sgi) >> >> >> >> > >> >> >> >> >> >> >> >> Heh - I don't see hangs with this patch :) But also I see that >> >> >> >> maintenance interrupt doesn't occur (and no hang as result) >> >> >> >> Stefano - is this expected? >> >> >> > >> >> >> > No maintenance interrupts at all? That's strange. You should be >> >> >> > receiving them when LRs are full and you still have interrupts >> >> >> > pending >> >> >> > to be added to them. >> >> >> > >> >> >> > You could add another printk here to see if you should be receiving >> >> >> > them: >> >> >> > >> >> >> > if ( !list_empty(¤t->arch.vgic.lr_pending) && >> >> >> > lr_all_full() ) >> >> >> > + { >> >> >> > + gdprintk(XENLOG_DEBUG, "requesting maintenance >> >> >> > interrupt\n"); >> >> >> > GICH[GICH_HCR] |= GICH_HCR_UIE; >> >> >> > - else >> >> >> > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> >> > - >> >> >> > + } >> >> >> > } >> >> >> > >> >> >> >> >> >> Requested properly: >> >> >> >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> >> >> >> >> But does not occur >> >> > >> >> > OK, let's see what's going on then by printing the irq number of the >> >> > maintenance interrupt: >> >> > >> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >> > index 4d2a92d..fed3167 100644 >> >> > --- a/xen/arch/arm/gic.c >> >> > +++ b/xen/arch/arm/gic.c >> >> > @@ -55,6 +55,7 @@ static struct { >> >> > static DEFINE_PER_CPU(uint64_t, lr_mask); >> >> > >> >> > static uint8_t nr_lrs; >> >> > +static bool uie_on; >> >> > #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1)) >> >> > >> >> > /* The GIC mapping of CPU interfaces does not necessarily match the >> >> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v) >> >> > { >> >> > int i = 0; >> >> > unsigned long flags; >> >> > + unsigned long hcr; >> >> > >> >> > /* The idle domain has no LRs to be cleared. Since >> >> > gic_restore_state >> >> > * doesn't write any LR registers for the idle domain they could be >> >> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v) >> >> > if ( is_idle_vcpu(v) ) >> >> > return; >> >> > >> >> > + hcr = GICH[GICH_HCR]; >> >> > + if ( hcr & GICH_HCR_UIE ) >> >> > + { >> >> > + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> > + uie_on = 1; >> >> > + } >> >> > + >> >> > spin_lock_irqsave(&v->arch.vgic.lock, flags); >> >> > >> >> > while ((i = find_next_bit((const unsigned long *) >> >> > &this_cpu(lr_mask), >> >> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int >> >> > is_fiq) >> >> > intack = GICC[GICC_IAR]; >> >> > irq = intack & GICC_IA_IRQ; >> >> > >> >> > + if ( uie_on ) >> >> > + { >> >> > + uie_on = 0; >> >> > + printk("received maintenance interrupt irq=%d\n", irq); >> >> > + } >> >> > if ( likely(irq >= 16 && irq < 1021) ) >> >> > { >> >> > local_irq_enable(); >> >> >> >> >> >> >> >> -- >> >> >> >> Andrii Tseglytskyi | Embedded Dev >> >> GlobalLogic >> >> www.globallogic.com >> >> >> >> >> >> -- >> >> Andrii Tseglytskyi | Embedded Dev >> GlobalLogic >> www.globallogic.com >> -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |