[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug


  • To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Tue, 30 Jan 2007 21:09:12 +0800
  • Delivery-date: Tue, 30 Jan 2007 05:08:59 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAP4q2wADxf1QAAFVV3AAAMUYXAAACCkw
  • Thread-topic: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug

>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>Sent: 2007年1月30日 20:57
>
>On 30/1/07 12:45 pm, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
>> Actually I'm a bit interested in this case, where watchdog thread
>> depends on timer interrupt to be awaken, while next timer interval
>> depends on soft timer wheel. For the new online cpu, all its
>> processes previously running have been migrated to others before
>> offline. Thus when just coming back online, there may be no
>> meaningful timer wheel and few activities on that vcpu. In this case,
>> a (LONG_MAX >> 1) may be returned as a big timeout.
>
>Yeah, but the thread should get migrated back again (or recreated) in
>fairly
>short order. I think we can agree it should take rather less than 10
>seconds. :-)

So my test is on an 'idle' domain which does nothing. In this case, I'm 
not sure whether processes except those per-cpu kernel threads will 
be migrated back when one cpu is still easy to handle them. For the 
per-cpu kernel threads, yes they'll be re-created, but will they be 
awaken immediately within 10s to do anything when there's no 
meaningful workload on that cpu? Actually this bug may not show 
when domain is under heavy load...

>
>> So saying this new watchdog model, simply walking timer wheel is
>> not enough. Maybe we can force max timeout value to 1s in safe_halt
>> to special this case? I'll make a try on this. But this will make current
>> tick-less model to a bit tick-ful back. :-)
>
>I'm sure this will fix the issue. But who knows what real underlying issue
>it might be hiding?
>
> -- Keir

I'm not sure whether it hides something. But the current situation 
seems like a self-trap to me: watchdog waits for timer interrupt to be 
awaken in 1s interval, while timer interrupt deliberately schedules a 
longer interval without considering watchdog and then blames 
watchdog thread not running within 10s. :-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.