[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@xxxxxxx> wrote: > Hi Dario and George, > > What is the status of this patch? It would be nice to have it for Xen 4.7 to > avoid unwanted crash when secondary CPUs fails to come online. Wei, can you put this on your release blockers list? (Julien, I take it this should be a blocker, right?) -George > > Regards, > > On 26/04/16 18:49, Dario Faggioli wrote: >> >> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote: >>> >>> Hi Dario, >>> >> Hi, >> >>> A couple of people have been reported Xen crash on the ARM64 >>> Foundation Model [1] with recent unstable. >>> >> Ok, thanks for reporting. >> >>> The crash seems to happen when Xen fails to bring up secondary CPUs >>> (see stack trace below). >>> >> Ah... I see. >> >>> From my understanding, csched_free_pdata is trying to kill the >>> timer spc->ticker. However the status of this timer is >>> TIMER_STATUS_invalid. >>> >>> This is because csched_init_pdata has set a deadline for the >>> timer (set_timer) and the softirq to schedule the timer has >>> not yet happen (indeed Xen is still in early boot). >>> >> Yes, this is sort of what happens (only slightly different, see the >> changelog of the atached patch patch). >> >> >>> I am not sure how to fix this issue. How will you recommend >>> to fix it? >>> >> Yeah, well, doing it cleanly includes a slight change in the scheduler >> hooks API, IMO... and we indeed should do it cleanly :-)) >> >> George, what do you think? >> >> Honestly, this is similar to what I was thinking to do already (I mean, >> having an deinit_pdata hook, "symmetric" with the init_pdata one), when >> working on that series, because I do think it's cleaner... then, I >> abandoned the idea, as it looked to not be necessary... But apparently >> it may actually be! :-) >> >> Let me know, and I'll resubmit the patch properly (together with >> another bugfix I have in my queue). >> >> Dario >> --- >> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32 >> Author: Dario Faggioli <dario.faggioli@xxxxxxxxxx> >> Date: Tue Apr 26 17:42:36 2016 +0200 >> >> xen: sched: fix killing an uninitialized timer in free_pdata. >> >> commit 64269d9365 "sched: implement .init_pdata in Credit, >> Credit2 and RTDS" helped fixing Credit2 runqueues, and >> the races we had in switching scheduler for pCPUs, but >> introduced another issue. In fact, if CPU bringup fails >> during __cpu_up() (and, more precisely, after CPU_UP_PREPARE, >> but before CPU_STARTING) the CPU_UP_CANCELED notifier >> would be executed, which calls the free_pdata hook. >> >> Such hook does two things: (1) undo the initialization >> done inside the init_pdata hook; (2) free the memory >> allocated by the alloc_pdata hook. >> >> However, in the failure path just described, it is possible >> that only alloc_pdata has really been called, which is >> potentially and issue (depending on what actually happens >> inside the implementation of free_pdata). >> >> In fact, for Credit1 (the only scheduler that actually >> allocates per-pCPU data), this result in calling kill_timer() >> on a timer that had not yet been initialized, which causes >> the following: >> >> (XEN) Xen call trace: >> (XEN) [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC) >> (XEN) [<000000000022f624>] kill_timer+0x108/0x2e0 (LR) >> (XEN) [<00000000002208c0>] >> sched_credit.c#csched_free_pdata+0xd8/0x114 >> (XEN) [<0000000000227a18>] >> schedule.c#cpu_schedule_callback+0xc0/0x12c >> (XEN) [<0000000000219944>] notifier_call_chain+0x78/0x9c >> (XEN) [<00000000002015fc>] cpu_up+0x104/0x130 >> (XEN) [<000000000028f7c0>] start_xen+0xaf8/0xce0 >> (XEN) [<00000000810021d8>] 00000000810021d8 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at >> timer.c:279 >> (XEN) **************************************** >> >> Solve this by making the scheduler hooks API symmetric again, >> i.e., by adding an deinit_pdata hook and making it responsible >> of undoing what init_pdata did, rather than asking to free_pdata >> to do everything. >> >> This is cleaner and, in the case at hand, makes it possible to >> only call free_pdata, which is the right thing to do, as only >> allocation and no initialization was performed. >> >> Reported-by: Julien Grall <julien.grall@xxxxxxx> >> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> >> --- >> Cc: George Dunlap <george.dunlap@xxxxxxxxxx> >> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> >> Cc: Varun.Swara@xxxxxxx >> Cc: Steve Capper <Steve.Capper@xxxxxxx> >> >> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c >> index bc36837..0a6a1b4 100644 >> --- a/xen/common/sched_credit.c >> +++ b/xen/common/sched_credit.c >> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu >> *new) >> } >> >> static void >> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu) >> +csched_free_pdata(const struct scheduler *ops, void *pcpu) >> { >> - struct csched_private *prv = CSCHED_PRIV(ops); >> struct csched_pcpu *spc = pcpu; >> - unsigned long flags; >> >> if ( spc == NULL ) >> return; >> >> + xfree(spc); >> +} >> + >> +static void >> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) >> +{ >> + struct csched_private *prv = CSCHED_PRIV(ops); >> + struct csched_pcpu *spc = pcpu; >> + unsigned long flags; >> + >> + ASSERT(spc != NULL); >> + >> spin_lock_irqsave(&prv->lock, flags); >> >> prv->credit -= prv->credits_per_tslice; >> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void >> *pcpu, int cpu) >> kill_timer(&prv->master_ticker); >> >> spin_unlock_irqrestore(&prv->lock, flags); >> - >> - xfree(spc); >> } >> >> static void * >> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = { >> .free_vdata = csched_free_vdata, >> .alloc_pdata = csched_alloc_pdata, >> .init_pdata = csched_init_pdata, >> + .deinit_pdata = csched_deinit_pdata, >> .free_pdata = csched_free_pdata, >> .switch_sched = csched_switch_sched, >> .alloc_domdata = csched_alloc_domdata, >> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >> index 46b9279..f4c37b4 100644 >> --- a/xen/common/sched_credit2.c >> +++ b/xen/common/sched_credit2.c >> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops, >> unsigned int cpu, >> } >> >> static void >> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu) >> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) >> { >> unsigned long flags; >> struct csched2_private *prv = CSCHED2_PRIV(ops); >> struct csched2_runqueue_data *rqd; >> int rqi; >> >> + ASSERT(pcpu == NULL); >> + >> spin_lock_irqsave(&prv->lock, flags); >> >> ASSERT(cpumask_test_cpu(cpu, &prv->initialized)); >> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = { >> .alloc_vdata = csched2_alloc_vdata, >> .free_vdata = csched2_free_vdata, >> .init_pdata = csched2_init_pdata, >> - .free_pdata = csched2_free_pdata, >> + .deinit_pdata = csched2_deinit_pdata, >> .switch_sched = csched2_switch_sched, >> .alloc_domdata = csched2_alloc_domdata, >> .free_domdata = csched2_free_domdata, >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> index 5546999..1a64521 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu) >> struct schedule_data *sd = &per_cpu(schedule_data, cpu); >> struct scheduler *sched = per_cpu(scheduler, cpu); >> >> - SCHED_OP(sched, free_pdata, sd->sched_priv, cpu); >> + SCHED_OP(sched, free_pdata, sd->sched_priv); >> SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv); >> >> idle_vcpu[cpu]->sched_priv = NULL; >> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback( >> case CPU_UP_PREPARE: >> rc = cpu_schedule_up(cpu); >> break; >> - case CPU_UP_CANCELED: >> case CPU_DEAD: >> + SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu); >> + /* Fallthrough */ >> + case CPU_UP_CANCELED: >> cpu_schedule_down(cpu); >> break; >> default: >> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct >> cpupool *c) >> vpriv = SCHED_OP(new_ops, alloc_vdata, idle, >> idle->domain->sched_priv); >> if ( vpriv == NULL ) >> { >> - SCHED_OP(new_ops, free_pdata, ppriv, cpu); >> + SCHED_OP(new_ops, free_pdata, ppriv); >> return -ENOMEM; >> } >> >> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct >> cpupool *c) >> SCHED_OP(new_ops, tick_resume, cpu); >> >> SCHED_OP(old_ops, free_vdata, vpriv_old); >> - SCHED_OP(old_ops, free_pdata, ppriv_old, cpu); >> + SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu); >> + SCHED_OP(old_ops, free_pdata, ppriv_old); >> >> out: >> per_cpu(cpupool, cpu) = c; >> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h >> index 1db7c8d..240f66c 100644 >> --- a/xen/include/xen/sched-if.h >> +++ b/xen/include/xen/sched-if.h >> @@ -135,9 +135,10 @@ struct scheduler { >> void (*free_vdata) (const struct scheduler *, void *); >> void * (*alloc_vdata) (const struct scheduler *, struct >> vcpu *, >> void *); >> - void (*free_pdata) (const struct scheduler *, void *, >> int); >> + void (*free_pdata) (const struct scheduler *, void *); >> void * (*alloc_pdata) (const struct scheduler *, int); >> void (*init_pdata) (const struct scheduler *, void *, >> int); >> + void (*deinit_pdata) (const struct scheduler *, void *, >> int); >> void (*free_domdata) (const struct scheduler *, void *); >> void * (*alloc_domdata) (const struct scheduler *, struct >> domain *); >> >> > > -- > Julien Grall > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |