[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



Gic dump during interrupt requesting:

(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)    HW_LR[0]=3a00001f
(XEN)    HW_LR[1]=9a015856
(XEN)    HW_LR[2]=1a00001b
(XEN)    HW_LR[3]=9a00e439
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=86 lr=1
(XEN) Inflight irq=27 lr=2
(XEN) Inflight irq=57 lr=3
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >>> Hi Stefano,
>>> >>>
>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >>> >> Hi Stefano,
>>> >>> >>
>>> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && 
>>> >>> >> > > lr_all_full() )
>>> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >>> >> > >      else
>>> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >>> >> > >
>>> >>> >> > >  }
>>> >>> >> >
>>> >>> >> > Yes, exactly
>>> >>> >>
>>> >>> >> I tried, hang still occurs with this change
>>> >>> >
>>> >>> > We need to figure out why during the hang you still have all the LRs
>>> >>> > busy even if you are getting maintenance interrupts that should cause
>>> >>> > them to be cleared.
>>> >>> >
>>> >>>
>>> >>> I see that I have free LRs during maintenance interrupt
>>> >>>
>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >>> (XEN)    HW_LR[0]=9a015856
>>> >>> (XEN)    HW_LR[1]=0
>>> >>> (XEN)    HW_LR[2]=0
>>> >>> (XEN)    HW_LR[3]=0
>>> >>> (XEN) Inflight irq=86 lr=0
>>> >>> (XEN) Inflight irq=2 lr=255
>>> >>> (XEN) Pending irq=2
>>> >>>
>>> >>> But I see that after I got hang - maintenance interrupts are generated
>>> >>> continuously. Platform continues printing the same log till reboot.
>>> >>
>>> >> Exactly the same log? As in the one above you just pasted?
>>> >> That is very very suspicious.
>>> >
>>> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> > correctly.
>>> >
>>> >>
>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>>> >> new maintenance interrupt immediately causing an infinite loop.
>>> >>
>>> >
>>> > Yes, this is what I'm thinking about. Taking in account all collected
>>> > debug info it looks like once LRs are overloaded with SGIs -
>>> > maintenance interrupt occurs.
>>> > And then it is not handled properly, and occurs again and again - so
>>> > platform hangs inside its handler.
>>> >
>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>>> >> hypervisor entry.
>>> >>
>>> >
>>> > Now trying.
>>> >
>>> >>
>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> index 4d2a92d..6ae8dc4 100644
>>> >> --- a/xen/arch/arm/gic.c
>>> >> +++ b/xen/arch/arm/gic.c
>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >>      if ( is_idle_vcpu(v) )
>>> >>          return;
>>> >>
>>> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> +
>>> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >>
>>> >>      while ((i = find_next_bit((const unsigned long *) 
>>> >> &this_cpu(lr_mask),
>>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >>
>>> >>      gic_restore_pending_irqs(current);
>>> >>
>>> >> -
>>> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> -    else
>>> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> -
>>> >>  }
>>> >>
>>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>>> >> gic_sgi sgi)
>>> >
>>>
>>> Heh - I don't see hangs with this patch :) But also I see that
>>> maintenance interrupt doesn't occur (and no hang as result)
>>> Stefano - is this expected?
>>
>> No maintenance interrupts at all? That's strange. You should be
>> receiving them when LRs are full and you still have interrupts pending
>> to be added to them.
>>
>> You could add another printk here to see if you should be receiving
>> them:
>>
>>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> +    {
>> +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> -    else
>> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> -
>> +    }
>>  }
>>
>
> Requested properly:
>
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>
> But does not occur
>
>
>>
>>> >
>>> >
>>> > --
>>> >
>>> > Andrii Tseglytskyi | Embedded Dev
>>> > GlobalLogic
>>> > www.globallogic.com
>>>
>>>
>>>
>>> --
>>>
>>> Andrii Tseglytskyi | Embedded Dev
>>> GlobalLogic
>>> www.globallogic.com
>>>
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.