[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] dom0 hang



>-----Original Message-----
>From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>Sent: Tuesday, July 07, 2009 11:47 AM
>To: Yu, Ke
>Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Kurt C.
>Hackel
>Subject: Re: [Xen-devel] dom0 hang
>
>
>Well, the problem takes long to reproduce (only on certain boxes). And then it
>may not always happen. So I want to make sure I understand the fix, as it
>was pretty hard to debug.

Ok, looking forward your update. 

>
>While the fix will still allow softirqs pending, I guess, functionally
>it's OK because after irq disable, it'll check for pending softirq, and
>just return. I think the comment about expecting no softirq pending
>should be fixed.

Right. the comment will also be fixed.

>
>BTW, why can't the tick be suspended when csched_schedule() concludes
>it's idle vcpu before returning? won't that would make it less intrusive.

The tick suspend can be put in csched_schedule, but the suspend/resume logic is 
still needed in acpi_processor_idle anyway, due to another dbs_timer 
suspend/resume. The intention here is to make acpi_processor_idle the central 
place for timers which are stoppable during idle period. If there is other 
stoppable timer in the future, it can be easily added to acpi_processor_idle. 
So it is clean to keep current logic. and as long as we carefully not over 
doing the softirq, it looks not so intrusive. How do you think?

Best Regards
Ke

>
>thanks,
>Mukesh
>
>
>Yu, Ke wrote:
>> Hi Mukesh,
>>
>> Could you please try the following patch, to see if it can resolve the issue
>you observed? Thanks.
>>
>> Best Regards
>> Ke
>>
>> diff -r d461c4d8af17 xen/arch/x86/acpi/cpu_idle.c
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -228,10 +228,10 @@ static void acpi_processor_idle(void)
>>      /*
>>       * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>       * which will break the later assumption of no sofirq pending,
>> -     * so add do_softirq
>> +     * so process the pending timers
>>       */
>> -    if ( softirq_pending(smp_processor_id()) )
>> -        do_softirq();
>> +
>> +    process_pending_timers();
>>
>>      /*
>>       * Interrupts must be disabled during bus mastering calculations and
>>
>>> -----Original Message-----
>>> From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx]
>>> Sent: Friday, July 03, 2009 9:19 AM
>>> To: mukesh.rathor@xxxxxxxxxx
>>> Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke;
>Kurt C.
>>> Hackel
>>> Subject: Re: [Xen-devel] dom0 hang
>>>
>>>
>>> Hi Kevin/Yu:
>>>
>>> acpi_processor_idle()
>>> {
>>>     sched_tick_suspend();
>>>      /*
>>>      * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer,
>>>      * which will break the later assumption of no sofirq pending,
>>>      * so add do_softirq
>>>      */
>>>     if ( softirq_pending(smp_processor_id()) )
>>>         do_softirq();             <===============
>>>
>>>     local_irq_disable();
>>>     if ( softirq_pending(smp_processor_id()) )
>>>     {
>>>         local_irq_enable();
>>>         sched_tick_resume();
>>>         cpufreq_dbs_timer_resume();
>>>         return;
>>>     }
>>>
>>> wouldn't the do_softirq() call scheduler with tick suspended, and
>>> the scheduler then context switches to another vcpu0 (with *_BOOST)
>which
>>> would result in the stuck vcpu I described?
>>>
>>> thanks
>>> Mukesh
>>>
>>>
>>> Mukesh Rathor wrote:
>>>> ah, i totally missed csched_tick():
>>>>     if ( !is_idle_vcpu(current) )
>>>>         csched_vcpu_acct(cpu);
>>>>
>>>> yeah, looks like that's what is going on. i'm still waiting to
>>>> reproduce. at first glance, looking at c/s 19460, seems like
>>>> suspend/resume, well at least the resume, should happen in
>>>> csched_schedule().....
>>>>
>>>> thanks,
>>>> Mukesh
>>>>
>>>>
>>>> George Dunlap wrote:
>>>>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke
>>>>> who wrote cs 19460]
>>>>>
>>>>> The functionality I was talking about, subtracting credits and
>>>>> clearing BOOST, happens in csched_vcpu_acct() (which is different than
>>>>> csched_acct()).  vcpu_acct() is called from csched_tick(), which
>>>>> should still happen every 10ms on every cpu.
>>>>>
>>>>> The patch I referred to (cs 19460) disables and re-enables tickers in
>>>>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the
>>>>> processor idles.  I can't see anywhere else that tickers are disabled,
>>>>> so it's probably something not properly re-enabling them again.
>>>>>
>>>>> Try applying the attached patch to see if that changes anything.  (I'm
>>>>> on the road, so I can't repro the lockup issue.)  If that doesn't
>>>>> work, try disabling c-states and see if that helps.  Then at least
>>>>> we'll know where the problem lies.
>>>>>
>>>>>  -George
>>>>>
>>>>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh
>>>>> Rathor<mukesh.rathor@xxxxxxxxxx> wrote:
>>>>>> that seems to only suspend csched_pcpu.ticker which is csched_tick
>>>>>> that is
>>>>>> only sorting local runq.
>>>>>>
>>>>>> again, we are concerned about csched_priv.master_ticker that calls
>>>>>> csched_acct? correct, so i can trace that?
>>>>>>
>>>>>> thanks,
>>>>>> mukesh
>>>>>>
>>>>>>
>>>>>> George Dunlap wrote:
>>>>>>> Ah, I see that there's been some changes to tick stuff with the
>>>>>>> c-state (e.g., cs 19460).  It looks like they're supposed to be going
>>>>>>> still, but perhaps the tick_suspend() and tick_resume() aren't being
>>>>>>> called properly.  Let me take a closer look.
>>>>>>>
>>>>>>>  -George
>>>>>>>
>>>>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh
>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>> George Dunlap wrote:
>>>>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh
>>>>>>>>> Rathor<mukesh.rathor@xxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>> dom0 hang:
>>>>>>>>>>  vcpu0 is trying to wakeup a task and in try_to_wake_up() calls
>>>>>>>>>>  task_rq_lock(). since the task has cpu set to 1, it gets runq lock
>>>>>>>>>>  for vcpu1. next it calls resched_task() which results in sending
>>>>>>>>>> IPI
>>>>>>>>>>  to vcpu1. for that, vcpu0 gets into the
>>> HYPERVISOR_event_channel_op
>>>>>>>>>>  HCALL and is waiting to return. Meanwhile, vcpu1 got running,
>>>>>>>>>> and is
>>>>>>>>>>  spinning on it's runq lock in
>>>>>>>>>> "schedule():spin_lock_irq(&rq->lock);",
>>>>>>>>>>  that vcpu0 is holding (and is waiting to return from the HCALL).
>>>>>>>>>>
>>>>>>>>>>  As I had noticed before, vcpu0 never gets scheduled in xen. So
>>>>>>>>>>  looking further into xen:
>>>>>>>>>>
>>>>>>>>>> xen:
>>>>>>>>>>  Both vcpu's are on the same runq, in this case cpu1. But the
>>>>>>>>>>  priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a
>>> result,
>>>>>>>>>>  the scheduler always picks vcpu1, and vcpu0 is starved. Also, I
>>>>>>>>>> see in
>>>>>>>>>>  kdb that the scheduler timer is not set on cpu 0. That would've
>>>>>>>>>>  allowed csched_load_balance() to kick in on cpu0. [Also, on
>>>>>>>>>>  cpu1, the accounting timer, csched_tick, is not set.  Altho,
>>>>>>>>>>  csched_tick() is running on cpu0, it only checks runq for cpu0.]
>>>>>>>>>>
>>>>>>>>>>  Looks like c/s 19500 changed csched_schedule():
>>>>>>>>>>
>>>>>>>>>> -    ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE);
>>>>>>>>>> +    ret.time = (is_idle_vcpu(snext->vcpu) ?
>>>>>>>>>> +                -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE));
>>>>>>>>>>
>>>>>>>>>>  The quickest fix for us would be to just back that out.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  BTW, just a comment on following (all in sched_credit.c):
>>>>>>>>>>
>>>>>>>>>>    if ( svc->pri == CSCHED_PRI_TS_UNDER &&
>>>>>>>>>>       !(svc->flags & CSCHED_FLAG_VCPU_PARKED) )
>>>>>>>>>>    {
>>>>>>>>>>       svc->pri = CSCHED_PRI_TS_BOOST;
>>>>>>>>>>    }
>>>>>>>>>>  comibined with
>>>>>>>>>>  if ( snext->pri > CSCHED_PRI_TS_OVER )
>>>>>>>>>>          __runq_remove(snext);
>>>>>>>>>>
>>>>>>>>>>    Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems
>dangerous.
>>> To
>>>>>>>>>> me,
>>>>>>>>>>    since csched_schedule() never checks for time accumulated
>by a
>>>>>>>>>>    vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a
>>>>>>>>>> vcpu to a
>>>>>>>>>>    pcpu. if that vcpu never makes progress, essentially, the
>system
>>>>>>>>>>    has lost a physical cpu.  Optionally, csched_schedule()
>should
>>>>>>>>>> always
>>>>>>>>>>    check for cpu time accumulated and reduce the priority over
>>> time.
>>>>>>>>>>    I can't tell right off if it already does that. or something like
>>>>>>>>>>    that :)...  my 2 cents.
>>>>>>>>> Hmm... what's supposed to happen is that eventually a timer tick
>will
>>>>>>>>> interrupt vcpu1.  If cpu1 is set to be "active", then it will be
>>>>>>>>> debited 10ms worth of credit.  Eventually, it will go into OVER,
>and
>>>>>>>>> lose BOOST.  If it's "inactive", then when the tick happens, it will
>>>>>>>>> be set to "active" and be debited 10ms again, setting it directly
>>>>>>>>> into
>>>>>>>>> OVER (and thus also losing boost).
>>>>>>>>>
>>>>>>>>> Can you see if the timer ticks are still happening, and perhaps put
>>>>>>>>> some tracing it to verify that what I described above is happening?
>>>>>>>>>
>>>>>>>>>  -George
>>>>>>>> George,
>>>>>>>>
>>>>>>>> Is that in csched_acct()? Looks like that's somehow gotten removed.
>If
>>>>>>>> true, then may be that's the fundamental problem to chase.
>>>>>>>>
>>>>>>>> Here's what the trq looks like when hung, not in any schedule
>>>>>>>> function:
>>>>>>>>
>>>>>>>> [0]xkdb> dtrq
>>>>>>>> CPU[00]: NOW:0x00003f2db9af369e
>>>>>>>>  1: exp=0x00003ee31cb32200 fn:csched_tick
>>> data:0000000000000000
>>>>>>>>  2: exp=0x00003ee347ece164 fn:time_calibration
>>> data:0000000000000000
>>>>>>>>  3: exp=0x00003ee69a28f04b fn:mce_work_fn
>>> data:0000000000000000
>>>>>>>>  4: exp=0x00003f055895e25f fn:plt_overflow
>>> data:0000000000000000
>>>>>>>>  5: exp=0x00003ee353810216 fn:rtc_update_second
>>> data:ffff83007f0226d8
>>>>>>>> CPU[01]: NOW:0x00003f2db9af369e
>>>>>>>>  1: exp=0x00003ee30b847988 fn:s_timer_fn
>>> data:0000000000000000
>>>>>>>>  2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback
>>> data:ffff83007f022a68
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> Mukesh
>>>>>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.