[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



That's right, the maintenance interrupt handler is not called, but it
doesn't do anything so we are fine. The important thing is that an
interrupt is sent and git_clear_lrs gets called on hypervisor entry.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> The only ambiguity left - maintenance interrupt handler is not called.
> It was requested for specific IRQ number, retrieved from device tree.
> But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
> spurious number 1023.
> 
> Regards,
> Andrii
> 
> On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> > On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> Hi Stefano,
> >>>
> >>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> > I think that's OK: it looks like that on your board for some reasons
> >>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
> >>> > normal maintenance interrupt.
> >>>
> >>> OK, but I think this should be investigated too. What do you think ?
> >>
> >> I think it is harmless: my guess is that if we clear UIE before reading
> >> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
> >> interrupt. But it doesn't really matter to us.
> >
> > OK. I think catching this will be a good exercise for someone )) But
> > out of scope for this issue.
> >
> >>
> >>> >
> >>> > But everything should work anyway without issues.
> >>> >
> >>> > This is the same patch as before but on top of the lastest xen-unstable
> >>> > tree. Please confirm if it works.
> >>> >
> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> > index 70d10d6..df140b9 100644
> >>> > --- a/xen/arch/arm/gic.c
> >>> > +++ b/xen/arch/arm/gic.c
> >>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >      if ( is_idle_vcpu(v) )
> >>> >          return;
> >>> >
> >>> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >>> > +
> >>> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >
> >>> >      while ((i = find_next_bit((const unsigned long *) 
> >>> > &this_cpu(lr_mask),
> >>> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >>> >
> >>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> >>> > -    else
> >>> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >>> >  }
> >>> >
> >>>
> >>> I confirm - it works fine. Will this be a final fix ?
> >>
> >> Yep :-)
> >> Many thanks for your help on this!
> >
> > Thank you Stefano. This issue was really critical for us :)
> >
> > Regards,
> > Andrii
> >
> >>
> >>
> >>> Regards,
> >>> Andrii
> >>>
> >>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> >>> >
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> I got this strange log:
> >>> >>
> >>> >> (XEN) received maintenance interrupt irq=1023
> >>> >>
> >>> >> And platform does not hang due to this:
> >>> >> +    hcr = GICH[GICH_HCR];
> >>> >> +    if ( hcr & GICH_HCR_UIE )
> >>> >> +    {
> >>> >> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> +        uie_on = 1;
> >>> >> +    }
> >>> >>
> >>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> >>> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>> >> >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>> >> >> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> >>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >>> >> >> >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> >>> Hi Stefano,
> >>> >> >> >> >>>
> >>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>> >> >> >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> >>> >> Hi Stefano,
> >>> >> >> >> >>> >>
> >>> >> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) 
> >>> >> >> >> >>> >> > > && lr_all_full() )
> >>> >> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> >> >> >>> >> > >      else
> >>> >> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> >> >> >>> >> > >
> >>> >> >> >> >>> >> > >  }
> >>> >> >> >> >>> >> >
> >>> >> >> >> >>> >> > Yes, exactly
> >>> >> >> >> >>> >>
> >>> >> >> >> >>> >> I tried, hang still occurs with this change
> >>> >> >> >> >>> >
> >>> >> >> >> >>> > We need to figure out why during the hang you still have 
> >>> >> >> >> >>> > all the LRs
> >>> >> >> >> >>> > busy even if you are getting maintenance interrupts that 
> >>> >> >> >> >>> > should cause
> >>> >> >> >> >>> > them to be cleared.
> >>> >> >> >> >>> >
> >>> >> >> >> >>>
> >>> >> >> >> >>> I see that I have free LRs during maintenance interrupt
> >>> >> >> >> >>>
> >>> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>> >> >> >> >>> (XEN)    HW_LR[0]=9a015856
> >>> >> >> >> >>> (XEN)    HW_LR[1]=0
> >>> >> >> >> >>> (XEN)    HW_LR[2]=0
> >>> >> >> >> >>> (XEN)    HW_LR[3]=0
> >>> >> >> >> >>> (XEN) Inflight irq=86 lr=0
> >>> >> >> >> >>> (XEN) Inflight irq=2 lr=255
> >>> >> >> >> >>> (XEN) Pending irq=2
> >>> >> >> >> >>>
> >>> >> >> >> >>> But I see that after I got hang - maintenance interrupts are 
> >>> >> >> >> >>> generated
> >>> >> >> >> >>> continuously. Platform continues printing the same log till 
> >>> >> >> >> >>> reboot.
> >>> >> >> >> >>
> >>> >> >> >> >> Exactly the same log? As in the one above you just pasted?
> >>> >> >> >> >> That is very very suspicious.
> >>> >> >> >> >
> >>> >> >> >> > Yes exactly the same log. And looks like it means that LRs are 
> >>> >> >> >> > flushed
> >>> >> >> >> > correctly.
> >>> >> >> >> >
> >>> >> >> >> >>
> >>> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly 
> >>> >> >> >> >> and
> >>> >> >> >> >> something we do in Xen, maybe writing to an LR register, 
> >>> >> >> >> >> might trigger a
> >>> >> >> >> >> new maintenance interrupt immediately causing an infinite 
> >>> >> >> >> >> loop.
> >>> >> >> >> >>
> >>> >> >> >> >
> >>> >> >> >> > Yes, this is what I'm thinking about. Taking in account all 
> >>> >> >> >> > collected
> >>> >> >> >> > debug info it looks like once LRs are overloaded with SGIs -
> >>> >> >> >> > maintenance interrupt occurs.
> >>> >> >> >> > And then it is not handled properly, and occurs again and 
> >>> >> >> >> > again - so
> >>> >> >> >> > platform hangs inside its handler.
> >>> >> >> >> >
> >>> >> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE 
> >>> >> >> >> >> immediately on
> >>> >> >> >> >> hypervisor entry.
> >>> >> >> >> >>
> >>> >> >> >> >
> >>> >> >> >> > Now trying.
> >>> >> >> >> >
> >>> >> >> >> >>
> >>> >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> >> >> >> >> index 4d2a92d..6ae8dc4 100644
> >>> >> >> >> >> --- a/xen/arch/arm/gic.c
> >>> >> >> >> >> +++ b/xen/arch/arm/gic.c
> >>> >> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >> >> >>      if ( is_idle_vcpu(v) )
> >>> >> >> >> >>          return;
> >>> >> >> >> >>
> >>> >> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >> +
> >>> >> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >> >> >> >>
> >>> >> >> >> >>      while ((i = find_next_bit((const unsigned long *) 
> >>> >> >> >> >> &this_cpu(lr_mask),
> >>> >> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>> >> >> >> >>
> >>> >> >> >> >>      gic_restore_pending_irqs(current);
> >>> >> >> >> >>
> >>> >> >> >> >> -
> >>> >> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && 
> >>> >> >> >> >> lr_all_full() )
> >>> >> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> >> >> -    else
> >>> >> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >> -
> >>> >> >> >> >>  }
> >>> >> >> >> >>
> >>> >> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, 
> >>> >> >> >> >> enum gic_sgi sgi)
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >> >> Heh - I don't see hangs with this patch :) But also I see that
> >>> >> >> >> maintenance interrupt doesn't occur (and no hang as result)
> >>> >> >> >> Stefano - is this expected?
> >>> >> >> >
> >>> >> >> > No maintenance interrupts at all? That's strange. You should be
> >>> >> >> > receiving them when LRs are full and you still have interrupts 
> >>> >> >> > pending
> >>> >> >> > to be added to them.
> >>> >> >> >
> >>> >> >> > You could add another printk here to see if you should be 
> >>> >> >> > receiving
> >>> >> >> > them:
> >>> >> >> >
> >>> >> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && 
> >>> >> >> > lr_all_full() )
> >>> >> >> > +    {
> >>> >> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance 
> >>> >> >> > interrupt\n");
> >>> >> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> > -    else
> >>> >> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> > -
> >>> >> >> > +    }
> >>> >> >> >  }
> >>> >> >> >
> >>> >> >>
> >>> >> >> Requested properly:
> >>> >> >>
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >>
> >>> >> >> But does not occur
> >>> >> >
> >>> >> > OK, let's see what's going on then by printing the irq number of the
> >>> >> > maintenance interrupt:
> >>> >> >
> >>> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> >> > index 4d2a92d..fed3167 100644
> >>> >> > --- a/xen/arch/arm/gic.c
> >>> >> > +++ b/xen/arch/arm/gic.c
> >>> >> > @@ -55,6 +55,7 @@ static struct {
> >>> >> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
> >>> >> >
> >>> >> >  static uint8_t nr_lrs;
> >>> >> > +static bool uie_on;
> >>> >> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
> >>> >> >
> >>> >> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> >>> >> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >  {
> >>> >> >      int i = 0;
> >>> >> >      unsigned long flags;
> >>> >> > +    unsigned long hcr;
> >>> >> >
> >>> >> >      /* The idle domain has no LRs to be cleared. Since 
> >>> >> > gic_restore_state
> >>> >> >       * doesn't write any LR registers for the idle domain they 
> >>> >> > could be
> >>> >> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >      if ( is_idle_vcpu(v) )
> >>> >> >          return;
> >>> >> >
> >>> >> > +    hcr = GICH[GICH_HCR];
> >>> >> > +    if ( hcr & GICH_HCR_UIE )
> >>> >> > +    {
> >>> >> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> > +        uie_on = 1;
> >>> >> > +    }
> >>> >> > +
> >>> >> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >> >
> >>> >> >      while ((i = find_next_bit((const unsigned long *) 
> >>> >> > &this_cpu(lr_mask),
> >>> >> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, 
> >>> >> > int is_fiq)
> >>> >> >          intack = GICC[GICC_IAR];
> >>> >> >          irq = intack & GICC_IA_IRQ;
> >>> >> >
> >>> >> > +        if ( uie_on )
> >>> >> > +        {
> >>> >> > +            uie_on = 0;
> >>> >> > +            printk("received maintenance interrupt irq=%d\n", irq);
> >>> >> > +        }
> >>> >> >          if ( likely(irq >= 16 && irq < 1021) )
> >>> >> >          {
> >>> >> >              local_irq_enable();
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >>
> >>> >> Andrii Tseglytskyi | Embedded Dev
> >>> >> GlobalLogic
> >>> >> www.globallogic.com
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Andrii Tseglytskyi | Embedded Dev
> >>> GlobalLogic
> >>> www.globallogic.com
> >>>
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.