[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug


  • To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Tue, 30 Jan 2007 17:54:21 +0800
  • Delivery-date: Tue, 30 Jan 2007 01:54:01 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDA=
  • Thread-topic: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug

>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>Sent: 2007年1月30日 17:38
>
>On 30/1/07 08:26, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
>> Stamp softlockup thread earlier before do_timer, because the
>> latter is the one to actually trigger lock warning for
>> long-time offline. Or else, I obserevd softlockup warning
>> easily at manual vcpu hot-remove/plug, or when suspend cancel
>> into old context.
>
>Actually the softlockup check is triggered from run_local_timers() which
>is
>called very near the end of timer_interrupt(). So the existing location for
>stamping the softlockup thread should be fine.

Yep, you're right. For this part, I looked at an old source. :-(

>
>> One point here is to cover both stolen and blocked time to
>> compare with offline threshold. vcpu hotplug falls into 'stolen'
>> case, but it's not enough. Considering xen time model is tickless
>> at idle, it's possible that big block time is requested which
>> also inflames softlockup thread.
>
>Every vcpu has a softlockup thread which regularly sleeps for some
>short
>period. If the vcpu sets a timeout beyond that sleep time then we have a
>bug. We shouldn't need to take into account blocked time -- Xen already
>ensures that wakeup latency is accounted as stolen time. Blocked time
>only
>includes time which the vcpu was willing to give up because it had no
>work
>to do.
>

If we don't take into account blocked time, maybe we have to disable 
softlockup check. Say an idle process gets a timeout value larger than 
10s by next_timer_interrupt, and then blocked. If, unfortunately, there's 
no other events happening before that timeout value, this vcpu will see 
softlockup warning after that timeout immediately since this period is 
not categorized into stolen time.

For example, when I hotremove and then hot-plug a vcpu on domU by:

Echo "0" > /sys/devices/system/cpu/cpu3/online
Echo "1" > /sys/devices/system/cpu/cpu3/online

After cpu3 is up, idle process sometimes get a big timeout value 
(0x40000000) by next_timer_interrupt. Then virtual timer for that vcpu 
is disabled, and vcpu itself blocks. Sometime later (larger than 10s), 
other events (like IPI) may wake this vcpu. In this case, if without 
including blocked time, I think it difficult to prevent softlockup warning

Another simple approach to trigger such warning is to let 
__xen_suspend() jumps to smp_resume immediately after 
smp_suspend, as a test case for suspend cancel. People can 
observe all vcpus except vcpu0 fall into that warning frequently.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.