[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



Does number 1023 mean that maintenance interrupt is global?

On Wed, Nov 19, 2014 at 7:03 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> I got this strange log:
>
> (XEN) received maintenance interrupt irq=1023
>
> And platform does not hang due to this:
> +    hcr = GICH[GICH_HCR];
> +    if ( hcr & GICH_HCR_UIE )
> +    {
> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +        uie_on = 1;
> +    }
>
> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
>>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >>> Hi Stefano,
>>> >> >>>
>>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >>> >> Hi Stefano,
>>> >> >>> >>
>>> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && 
>>> >> >>> >> > > lr_all_full() )
>>> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> >>> >> > >      else
>>> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> >>> >> > >
>>> >> >>> >> > >  }
>>> >> >>> >> >
>>> >> >>> >> > Yes, exactly
>>> >> >>> >>
>>> >> >>> >> I tried, hang still occurs with this change
>>> >> >>> >
>>> >> >>> > We need to figure out why during the hang you still have all the 
>>> >> >>> > LRs
>>> >> >>> > busy even if you are getting maintenance interrupts that should 
>>> >> >>> > cause
>>> >> >>> > them to be cleared.
>>> >> >>> >
>>> >> >>>
>>> >> >>> I see that I have free LRs during maintenance interrupt
>>> >> >>>
>>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >> >>> (XEN)    HW_LR[0]=9a015856
>>> >> >>> (XEN)    HW_LR[1]=0
>>> >> >>> (XEN)    HW_LR[2]=0
>>> >> >>> (XEN)    HW_LR[3]=0
>>> >> >>> (XEN) Inflight irq=86 lr=0
>>> >> >>> (XEN) Inflight irq=2 lr=255
>>> >> >>> (XEN) Pending irq=2
>>> >> >>>
>>> >> >>> But I see that after I got hang - maintenance interrupts are 
>>> >> >>> generated
>>> >> >>> continuously. Platform continues printing the same log till reboot.
>>> >> >>
>>> >> >> Exactly the same log? As in the one above you just pasted?
>>> >> >> That is very very suspicious.
>>> >> >
>>> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> >> > correctly.
>>> >> >
>>> >> >>
>>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> >> something we do in Xen, maybe writing to an LR register, might 
>>> >> >> trigger a
>>> >> >> new maintenance interrupt immediately causing an infinite loop.
>>> >> >>
>>> >> >
>>> >> > Yes, this is what I'm thinking about. Taking in account all collected
>>> >> > debug info it looks like once LRs are overloaded with SGIs -
>>> >> > maintenance interrupt occurs.
>>> >> > And then it is not handled properly, and occurs again and again - so
>>> >> > platform hangs inside its handler.
>>> >> >
>>> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately 
>>> >> >> on
>>> >> >> hypervisor entry.
>>> >> >>
>>> >> >
>>> >> > Now trying.
>>> >> >
>>> >> >>
>>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> >> index 4d2a92d..6ae8dc4 100644
>>> >> >> --- a/xen/arch/arm/gic.c
>>> >> >> +++ b/xen/arch/arm/gic.c
>>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >> >>      if ( is_idle_vcpu(v) )
>>> >> >>          return;
>>> >> >>
>>> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> +
>>> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >> >>
>>> >> >>      while ((i = find_next_bit((const unsigned long *) 
>>> >> >> &this_cpu(lr_mask),
>>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >> >>
>>> >> >>      gic_restore_pending_irqs(current);
>>> >> >>
>>> >> >> -
>>> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && 
>>> >> >> lr_all_full() )
>>> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> -    else
>>> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> -
>>> >> >>  }
>>> >> >>
>>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>>> >> >> gic_sgi sgi)
>>> >> >
>>> >>
>>> >> Heh - I don't see hangs with this patch :) But also I see that
>>> >> maintenance interrupt doesn't occur (and no hang as result)
>>> >> Stefano - is this expected?
>>> >
>>> > No maintenance interrupts at all? That's strange. You should be
>>> > receiving them when LRs are full and you still have interrupts pending
>>> > to be added to them.
>>> >
>>> > You could add another printk here to see if you should be receiving
>>> > them:
>>> >
>>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> > +    {
>>> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> > -    else
>>> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> > -
>>> > +    }
>>> >  }
>>> >
>>>
>>> Requested properly:
>>>
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>>
>>> But does not occur
>>
>> OK, let's see what's going on then by printing the irq number of the
>> maintenance interrupt:
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 4d2a92d..fed3167 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -55,6 +55,7 @@ static struct {
>>  static DEFINE_PER_CPU(uint64_t, lr_mask);
>>
>>  static uint8_t nr_lrs;
>> +static bool uie_on;
>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>>
>>  /* The GIC mapping of CPU interfaces does not necessarily match the
>> @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>>  {
>>      int i = 0;
>>      unsigned long flags;
>> +    unsigned long hcr;
>>
>>      /* The idle domain has no LRs to be cleared. Since gic_restore_state
>>       * doesn't write any LR registers for the idle domain they could be
>> @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>>      if ( is_idle_vcpu(v) )
>>          return;
>>
>> +    hcr = GICH[GICH_HCR];
>> +    if ( hcr & GICH_HCR_UIE )
>> +    {
>> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +        uie_on = 1;
>> +    }
>> +
>>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>
>>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int 
>> is_fiq)
>>          intack = GICC[GICC_IAR];
>>          irq = intack & GICC_IA_IRQ;
>>
>> +        if ( uie_on )
>> +        {
>> +            uie_on = 0;
>> +            printk("received maintenance interrupt irq=%d\n", irq);
>> +        }
>>          if ( likely(irq >= 16 && irq < 1021) )
>>          {
>>              local_irq_enable();
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.