[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] cpuidle causing Dom0 soft lockups



>From: Jan Beulich
>Sent: 2010年2月3日 18:16
>
>>>> "Yu, Ke" <ke.yu@xxxxxxxxx> 02.02.10 18:07 >>>
>>>Just fyi, we now also have seen an issue on a 24-CPU system that went
>>>away with cpuidle=0 (and static analysis of the hang hinted in that
>>>direction). All I can judge so far is that this likely has 
>something to do
>>>with our kernel's intensive use of the poll hypercall (i.e. 
>we see vCPU-s
>>>not waking up from the call despite there being pending unmasked or
>>>polled for events).
>>
>>We just identified the cause of this issue, and is trying to 
>find appropriate way to fix it.
>
>Hmm, while I agree that the scenario you describe can be a problem, I
>don't think it can explain the behavior on the 24-CPU system pointed
>out above, nor the one Juergen Gross pointed out yesterday.

Is 24-CPU system observed with same likelihood as 64-CPU system to
hang at boot time, or less frequent? Ke just did some theoretical analysis
by assuming some values. There could be other factors added to latency
and each system may have different characteristics too. We can't
draw conclusion whether smaller system will face same issue, by simply
changing CPU number in Ke's formula. :-) Possibly you can provide cpuidle
information on your 24-core system for further comparison.

>
>Nor can it explain why this happens at boot time (when you can take it
>for granted that several/most of the CPUs are idle [and hence would
>have their periodic timer stopped]).

Well, it's true that several/most of CPUs are idle at boot time, and when
they are blocked into Xen, periodic timer is stopped. However it doesn't 
mean that those 'idle' CPUs will block into Xen and never wake up in 
whole boot process (here we are talking about dozens of seconds). 
There're random events to wake blocked vCPU up: random process/
thread creations, device interrupts (based on affinity), per-cpu kernel 
thread may periodically wake up, etc. Most importantly is, that idle
vCPU will hook a singleshot timer when it's blocked into xen, and that
singleshot timer is calculated based on min_val of nearest timer
expiration and soft lockup threshold. Based on those assumption, 
although one vCPU of dom0 may be mostly idle in whole boot process,
it will still wake up occasionally.

Once a blocked vCPU is waken up, Xen will check whether it's next
periodic timer interrupt should be delivered. If vCPU has been blocked
over one dom0 tick (4ms in this specific SLES case), a virtual timer
will pend before resuming to guest. So every dom0 vCPU still enters
timer interrupt to acquire xtime_lock intermittently. I'm not sure how
much possibility to have those idle vCPUs waken up at same time. If
there's no special treatment to interleave vCPU timer interrupt, they
may be closely injected based on vCPU boot time point. Anyway, given
a situation where some idle vCPUs begin to contend with each other, 
and several of them exceeds busy-wait threshod to block into xen, 
latency from deep Cx will add chance for more vCPUs to contend same 
xtime_lock in same loop. Once many vCPUs may contend at same time,
and a full circle from 1st ticket to last ticket exceeds 4ms expected
by dom0, a new timer interrupt will be injected again. Then no vCPU
can escape from that contending loop.

>
>Also I would think that the rate at which xtime_lock is being acquired
>may not be the highest one in the entire system, and hence problems
>may continue to result even if we fixed timer_interrupt().

Although the rate acquiring xtime_look is unlikely the highest one, it's
a big kernel lock to be acquired on each CPU. ticket spinlock order
plus cpuidle latency may rendezvous more and more vCPUs in same 
contending loop, once latency for a vCPU to wait for specific ticket
starts to accumulate.

As Ke said, cpuidle just exaggerated spinlock issue in virtualization.
Considering another case when system is very busy and physical
CPU is overcommitted. Even when one vCPU is waken up, it may
be still in runqueue for dozens of milliseconds. Similarly vCPU holding
larger tickets have to wait for long time. If applying to xtime_lock,
you may still observe softlockup warning then. Possibly the real
solution is to not have dom0 with large virtual vCPUs. 

Thanks,
Kevin

>
>>Anyway, cpuidle is just one side, we can anticipate that if 
>CPU number is large enough to lead NR_CPU * T1 > 4ms, this 
>issue will occurs again. So another way is to make dom0 
>scaling well by not using xtime_lock, although this is pretty 
>hard currently. Or another way is to limit dom0 vCPU number to 
>certain reasonable level.
>
>I would not think that dealing with the xtime_lock scalability issue in
>timer_interrupt() should be *that* difficult. In particular it 
>should be
>possibly to assign an on-duty CPU (permanent or on a round-robin
>basis) that deals with updating jiffies/wallclock, and all other CPUs
>just update their local clocks. I had thought about this before, but
>never found a strong need to experiment with that.
>
>Jan
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@xxxxxxxxxxxxxxxxxxx
>http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.