[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> Hi Stefano,
>>> >>
>>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && 
>>> >> > > lr_all_full() )
>>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> > >      else
>>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> > >
>>> >> > >  }
>>> >> >
>>> >> > Yes, exactly
>>> >>
>>> >> I tried, hang still occurs with this change
>>> >
>>> > We need to figure out why during the hang you still have all the LRs
>>> > busy even if you are getting maintenance interrupts that should cause
>>> > them to be cleared.
>>> >
>>>
>>> I see that I have free LRs during maintenance interrupt
>>>
>>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> (XEN)    HW_LR[0]=9a015856
>>> (XEN)    HW_LR[1]=0
>>> (XEN)    HW_LR[2]=0
>>> (XEN)    HW_LR[3]=0
>>> (XEN) Inflight irq=86 lr=0
>>> (XEN) Inflight irq=2 lr=255
>>> (XEN) Pending irq=2
>>>
>>> But I see that after I got hang - maintenance interrupts are generated
>>> continuously. Platform continues printing the same log till reboot.
>>
>> Exactly the same log? As in the one above you just pasted?
>> That is very very suspicious.
>
> Yes exactly the same log. And looks like it means that LRs are flushed
> correctly.
>
>>
>> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> something we do in Xen, maybe writing to an LR register, might trigger a
>> new maintenance interrupt immediately causing an infinite loop.
>>
>
> Yes, this is what I'm thinking about. Taking in account all collected
> debug info it looks like once LRs are overloaded with SGIs -
> maintenance interrupt occurs.
> And then it is not handled properly, and occurs again and again - so
> platform hangs inside its handler.
>
>> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> hypervisor entry.
>>
>
> Now trying.
>
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 4d2a92d..6ae8dc4 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>      if ( is_idle_vcpu(v) )
>>          return;
>>
>> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +
>>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>
>>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> @@ -821,12 +823,8 @@ void gic_inject(void)
>>
>>      gic_restore_pending_irqs(current);
>>
>> -
>>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> -    else
>> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> -
>>  }
>>
>>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
>> sgi)
>

Heh - I don't see hangs with this patch :) But also I see that
maintenance interrupt doesn't occur (and no hang as result)
Stefano - is this expected?

>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.