[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx] >Sent: 2007年1月30日 17:38 > >On 30/1/07 08:26, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > >> Stamp softlockup thread earlier before do_timer, because the >> latter is the one to actually trigger lock warning for >> long-time offline. Or else, I obserevd softlockup warning >> easily at manual vcpu hot-remove/plug, or when suspend cancel >> into old context. > >Actually the softlockup check is triggered from run_local_timers() which >is >called very near the end of timer_interrupt(). So the existing location for >stamping the softlockup thread should be fine. Yep, you're right. For this part, I looked at an old source. :-( > >> One point here is to cover both stolen and blocked time to >> compare with offline threshold. vcpu hotplug falls into 'stolen' >> case, but it's not enough. Considering xen time model is tickless >> at idle, it's possible that big block time is requested which >> also inflames softlockup thread. > >Every vcpu has a softlockup thread which regularly sleeps for some >short >period. If the vcpu sets a timeout beyond that sleep time then we have a >bug. We shouldn't need to take into account blocked time -- Xen already >ensures that wakeup latency is accounted as stolen time. Blocked time >only >includes time which the vcpu was willing to give up because it had no >work >to do. > If we don't take into account blocked time, maybe we have to disable softlockup check. Say an idle process gets a timeout value larger than 10s by next_timer_interrupt, and then blocked. If, unfortunately, there's no other events happening before that timeout value, this vcpu will see softlockup warning after that timeout immediately since this period is not categorized into stolen time. For example, when I hotremove and then hot-plug a vcpu on domU by: Echo "0" > /sys/devices/system/cpu/cpu3/online Echo "1" > /sys/devices/system/cpu/cpu3/online After cpu3 is up, idle process sometimes get a big timeout value (0x40000000) by next_timer_interrupt. Then virtual timer for that vcpu is disabled, and vcpu itself blocks. Sometime later (larger than 10s), other events (like IPI) may wake this vcpu. In this case, if without including blocked time, I think it difficult to prevent softlockup warning Another simple approach to trigger such warning is to let __xen_suspend() jumps to smp_resume immediately after smp_suspend, as a test case for suspend cancel. People can observe all vcpus except vcpu0 fall into that warning frequently. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |