Xen project Mailing List

RE: [Xen-devel] cpuidle causing Dom0 soft lockups

To: Jan Beulich <JBeulich@xxxxxxxxxx>, "Yu, Ke" <ke.yu@xxxxxxxxx>

From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>

Date: Wed, 3 Feb 2010 20:10:45 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: Keir, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Fraser <keir.fraser@xxxxxxxxxxxxx>

Delivery-date: Wed, 03 Feb 2010 04:11:17 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcqkufrWFsBEE5ikR/6FkJlnzu7pvgACbEuw

Thread-topic: [Xen-devel] cpuidle causing Dom0 soft lockups

>From: Jan Beulich >Sent: 2010年2月3日 18:16 > >>>> "Yu, Ke" <ke.yu@xxxxxxxxx> 02.02.10 18:07 >>> >>>Just fyi, we now also have seen an issue on a 24-CPU system that went >>>away with cpuidle=0 (and static analysis of the hang hinted in that >>>direction). All I can judge so far is that this likely has >something to do >>>with our kernel's intensive use of the poll hypercall (i.e. >we see vCPU-s >>>not waking up from the call despite there being pending unmasked or >>>polled for events). >> >>We just identified the cause of this issue, and is trying to >find appropriate way to fix it. > >Hmm, while I agree that the scenario you describe can be a problem, I >don't think it can explain the behavior on the 24-CPU system pointed >out above, nor the one Juergen Gross pointed out yesterday. Is 24-CPU system observed with same likelihood as 64-CPU system to hang at boot time, or less frequent? Ke just did some theoretical analysis by assuming some values. There could be other factors added to latency and each system may have different characteristics too. We can't draw conclusion whether smaller system will face same issue, by simply changing CPU number in Ke's formula. :-) Possibly you can provide cpuidle information on your 24-core system for further comparison. > >Nor can it explain why this happens at boot time (when you can take it >for granted that several/most of the CPUs are idle [and hence would >have their periodic timer stopped]). Well, it's true that several/most of CPUs are idle at boot time, and when they are blocked into Xen, periodic timer is stopped. However it doesn't mean that those 'idle' CPUs will block into Xen and never wake up in whole boot process (here we are talking about dozens of seconds). There're random events to wake blocked vCPU up: random process/ thread creations, device interrupts (based on affinity), per-cpu kernel thread may periodically wake up, etc. Most importantly is, that idle vCPU will hook a singleshot timer when it's blocked into xen, and that singleshot timer is calculated based on min_val of nearest timer expiration and soft lockup threshold. Based on those assumption, although one vCPU of dom0 may be mostly idle in whole boot process, it will still wake up occasionally. Once a blocked vCPU is waken up, Xen will check whether it's next periodic timer interrupt should be delivered. If vCPU has been blocked over one dom0 tick (4ms in this specific SLES case), a virtual timer will pend before resuming to guest. So every dom0 vCPU still enters timer interrupt to acquire xtime_lock intermittently. I'm not sure how much possibility to have those idle vCPUs waken up at same time. If there's no special treatment to interleave vCPU timer interrupt, they may be closely injected based on vCPU boot time point. Anyway, given a situation where some idle vCPUs begin to contend with each other, and several of them exceeds busy-wait threshod to block into xen, latency from deep Cx will add chance for more vCPUs to contend same xtime_lock in same loop. Once many vCPUs may contend at same time, and a full circle from 1st ticket to last ticket exceeds 4ms expected by dom0, a new timer interrupt will be injected again. Then no vCPU can escape from that contending loop. > >Also I would think that the rate at which xtime_lock is being acquired >may not be the highest one in the entire system, and hence problems >may continue to result even if we fixed timer_interrupt(). Although the rate acquiring xtime_look is unlikely the highest one, it's a big kernel lock to be acquired on each CPU. ticket spinlock order plus cpuidle latency may rendezvous more and more vCPUs in same contending loop, once latency for a vCPU to wait for specific ticket starts to accumulate. As Ke said, cpuidle just exaggerated spinlock issue in virtualization. Considering another case when system is very busy and physical CPU is overcommitted. Even when one vCPU is waken up, it may be still in runqueue for dozens of milliseconds. Similarly vCPU holding larger tickets have to wait for long time. If applying to xtime_lock, you may still observe softlockup warning then. Possibly the real solution is to not have dom0 with large virtual vCPUs. Thanks, Kevin > >>Anyway, cpuidle is just one side, we can anticipate that if >CPU number is large enough to lead NR_CPU * T1 > 4ms, this >issue will occurs again. So another way is to make dom0 >scaling well by not using xtime_lock, although this is pretty >hard currently. Or another way is to limit dom0 vCPU number to >certain reasonable level. > >I would not think that dealing with the xtime_lock scalability issue in >timer_interrupt() should be *that* difficult. In particular it >should be >possibly to assign an on-duty CPU (permanent or on a round-robin >basis) that deals with updating jiffies/wallclock, and all other CPUs >just update their local clocks. I had thought about this before, but >never found a strong need to experiment with that. > >Jan > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@xxxxxxxxxxxxxxxxxxx >http://lists.xensource.com/xen-devel >

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.