[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279



On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi Dario and George,
>
> What is the status of this patch? It would be nice to have it for Xen 4.7 to
> avoid unwanted crash when secondary CPUs fails to come online.

Wei, can you put this on your release blockers list?  (Julien, I take
it this should be a blocker, right?)

 -George

>
> Regards,
>
> On 26/04/16 18:49, Dario Faggioli wrote:
>>
>> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>>>
>>> Hi Dario,
>>>
>> Hi,
>>
>>> A couple of people have been reported Xen crash on the ARM64
>>> Foundation Model [1] with recent unstable.
>>>
>> Ok, thanks for reporting.
>>
>>> The crash seems to happen when Xen fails to bring up secondary CPUs
>>> (see stack trace below).
>>>
>> Ah... I see.
>>
>>>  From my understanding, csched_free_pdata is trying to kill the
>>> timer spc->ticker. However the status of this timer is
>>> TIMER_STATUS_invalid.
>>>
>>> This is because csched_init_pdata has set a deadline for the
>>> timer (set_timer) and the softirq to schedule the timer has
>>> not yet happen (indeed Xen is still in early boot).
>>>
>> Yes, this is sort of what happens (only slightly different, see the
>> changelog of the atached patch patch).
>>
>>
>>> I am not sure how to fix this issue. How will you recommend
>>> to fix it?
>>>
>> Yeah, well, doing it cleanly includes a slight change in the scheduler
>> hooks API, IMO... and we indeed should do it cleanly :-))
>>
>> George, what do you think?
>>
>> Honestly, this is similar to what I was thinking to do already (I mean,
>> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
>> working on that series, because I do think it's cleaner... then, I
>> abandoned the idea, as it looked to not be necessary... But apparently
>> it may actually be! :-)
>>
>> Let me know, and I'll resubmit the patch properly (together with
>> another bugfix I have in my queue).
>>
>> Dario
>> ---
>> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
>> Author: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>> Date:   Tue Apr 26 17:42:36 2016 +0200
>>
>>      xen: sched: fix killing an uninitialized timer in free_pdata.
>>
>>      commit 64269d9365 "sched: implement .init_pdata in Credit,
>>      Credit2 and RTDS" helped fixing Credit2 runqueues, and
>>      the races we had in switching scheduler for pCPUs, but
>>      introduced another issue. In fact, if CPU bringup fails
>>      during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>>      but before CPU_STARTING) the CPU_UP_CANCELED notifier
>>      would be executed, which calls the free_pdata hook.
>>
>>      Such hook does two things: (1) undo the initialization
>>      done inside the init_pdata hook; (2) free the memory
>>      allocated by the alloc_pdata hook.
>>
>>      However, in the failure path just described, it is possible
>>      that only alloc_pdata has really been called, which is
>>      potentially and issue (depending on what actually happens
>>      inside the implementation of free_pdata).
>>
>>      In fact, for Credit1 (the only scheduler that actually
>>      allocates per-pCPU data), this result in calling kill_timer()
>>      on a timer that had not yet been initialized, which causes
>>      the following:
>>
>>      (XEN) Xen call trace:
>>      (XEN)    [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>>      (XEN)    [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>>      (XEN)    [<00000000002208c0>]
>> sched_credit.c#csched_free_pdata+0xd8/0x114
>>      (XEN)    [<0000000000227a18>]
>> schedule.c#cpu_schedule_callback+0xc0/0x12c
>>      (XEN)    [<0000000000219944>] notifier_call_chain+0x78/0x9c
>>      (XEN)    [<00000000002015fc>] cpu_up+0x104/0x130
>>      (XEN)    [<000000000028f7c0>] start_xen+0xaf8/0xce0
>>      (XEN)    [<00000000810021d8>] 00000000810021d8
>>      (XEN)
>>      (XEN)
>>      (XEN) ****************************************
>>      (XEN) Panic on CPU 0:
>>      (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
>> timer.c:279
>>      (XEN) ****************************************
>>
>>      Solve this by making the scheduler hooks API symmetric again,
>>      i.e., by adding an deinit_pdata hook and making it responsible
>>      of undoing what init_pdata did, rather than asking to free_pdata
>>      to do everything.
>>
>>      This is cleaner and, in the case at hand, makes it possible to
>>      only call free_pdata, which is the right thing to do, as only
>>      allocation and no initialization was performed.
>>
>>      Reported-by: Julien Grall <julien.grall@xxxxxxx>
>>      Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>>      ---
>>      Cc: George Dunlap <george.dunlap@xxxxxxxxxx>
>>      Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>>      Cc: Varun.Swara@xxxxxxx
>>      Cc: Steve Capper <Steve.Capper@xxxxxxx>
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index bc36837..0a6a1b4 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu
>> *new)
>>   }
>>
>>   static void
>> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched_free_pdata(const struct scheduler *ops, void *pcpu)
>>   {
>> -    struct csched_private *prv = CSCHED_PRIV(ops);
>>       struct csched_pcpu *spc = pcpu;
>> -    unsigned long flags;
>>
>>       if ( spc == NULL )
>>           return;
>>
>> +    xfree(spc);
>> +}
>> +
>> +static void
>> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +{
>> +    struct csched_private *prv = CSCHED_PRIV(ops);
>> +    struct csched_pcpu *spc = pcpu;
>> +    unsigned long flags;
>> +
>> +    ASSERT(spc != NULL);
>> +
>>       spin_lock_irqsave(&prv->lock, flags);
>>
>>       prv->credit -= prv->credits_per_tslice;
>> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void
>> *pcpu, int cpu)
>>           kill_timer(&prv->master_ticker);
>>
>>       spin_unlock_irqrestore(&prv->lock, flags);
>> -
>> -    xfree(spc);
>>   }
>>
>>   static void *
>> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
>>       .free_vdata     = csched_free_vdata,
>>       .alloc_pdata    = csched_alloc_pdata,
>>       .init_pdata     = csched_init_pdata,
>> +    .deinit_pdata   = csched_deinit_pdata,
>>       .free_pdata     = csched_free_pdata,
>>       .switch_sched   = csched_switch_sched,
>>       .alloc_domdata  = csched_alloc_domdata,
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 46b9279..f4c37b4 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops,
>> unsigned int cpu,
>>   }
>>
>>   static void
>> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>>   {
>>       unsigned long flags;
>>       struct csched2_private *prv = CSCHED2_PRIV(ops);
>>       struct csched2_runqueue_data *rqd;
>>       int rqi;
>>
>> +    ASSERT(pcpu == NULL);
>> +
>>       spin_lock_irqsave(&prv->lock, flags);
>>
>>       ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
>> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
>>       .alloc_vdata    = csched2_alloc_vdata,
>>       .free_vdata     = csched2_free_vdata,
>>       .init_pdata     = csched2_init_pdata,
>> -    .free_pdata     = csched2_free_pdata,
>> +    .deinit_pdata   = csched2_deinit_pdata,
>>       .switch_sched   = csched2_switch_sched,
>>       .alloc_domdata  = csched2_alloc_domdata,
>>       .free_domdata   = csched2_free_domdata,
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 5546999..1a64521 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
>>       struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>>       struct scheduler *sched = per_cpu(scheduler, cpu);
>>
>> -    SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>> +    SCHED_OP(sched, free_pdata, sd->sched_priv);
>>       SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>>
>>       idle_vcpu[cpu]->sched_priv = NULL;
>> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
>>       case CPU_UP_PREPARE:
>>           rc = cpu_schedule_up(cpu);
>>           break;
>> -    case CPU_UP_CANCELED:
>>       case CPU_DEAD:
>> +        SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>> +        /* Fallthrough */
>> +    case CPU_UP_CANCELED:
>>           cpu_schedule_down(cpu);
>>           break;
>>       default:
>> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>>       vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
>> idle->domain->sched_priv);
>>       if ( vpriv == NULL )
>>       {
>> -        SCHED_OP(new_ops, free_pdata, ppriv, cpu);
>> +        SCHED_OP(new_ops, free_pdata, ppriv);
>>           return -ENOMEM;
>>       }
>>
>> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>>       SCHED_OP(new_ops, tick_resume, cpu);
>>
>>       SCHED_OP(old_ops, free_vdata, vpriv_old);
>> -    SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
>> +    SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
>> +    SCHED_OP(old_ops, free_pdata, ppriv_old);
>>
>>    out:
>>       per_cpu(cpupool, cpu) = c;
>> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
>> index 1db7c8d..240f66c 100644
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -135,9 +135,10 @@ struct scheduler {
>>       void         (*free_vdata)     (const struct scheduler *, void *);
>>       void *       (*alloc_vdata)    (const struct scheduler *, struct
>> vcpu *,
>>                                       void *);
>> -    void         (*free_pdata)     (const struct scheduler *, void *,
>> int);
>> +    void         (*free_pdata)     (const struct scheduler *, void *);
>>       void *       (*alloc_pdata)    (const struct scheduler *, int);
>>       void         (*init_pdata)     (const struct scheduler *, void *,
>> int);
>> +    void         (*deinit_pdata)   (const struct scheduler *, void *,
>> int);
>>       void         (*free_domdata)   (const struct scheduler *, void *);
>>       void *       (*alloc_domdata)  (const struct scheduler *, struct
>> domain *);
>>
>>
>
> --
> Julien Grall
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.