Xen project Mailing List

[Xen-devel] million cycle interrupt

To: "Xen-Devel (E-mail)" <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Sun, 12 Apr 2009 20:16:35 +0000 (GMT)

Delivery-date: Sun, 12 Apr 2009 13:17:24 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

(I realize my last similar topic turned out to be a false alarm, so I hope I am not "crying wolf" again.) I'm still trying to understand some anomalous measurements in tmem, particularly instances where the maximum cycle count greatly exceeds the average. It seems that tmem's compression code is a magnet for interruptions. This inspired me to create a more controlled magnet, an "interrupt honeypot". To do this, at every tmem call, I run a loop which does nothing but repetitively read tsc and check the difference between successive reads. On both of my test machines, the measurement is normally well under 100 cycles. Infrequently, I get a "large" measurement which, since xen is non-preemptive, indicates a lengthy interrupt (or possibly that the tsc is getting moved forward). My code uses per_cpu to ensure that there aren't any measurement/recording races (which were the issue with my previous 10M "problem"). The result: On my quad-core-by-two-thread machine, I frequently get "large" measurements over 250000 cycles, with the max just over 1 million (and actually just over 2^20). Frequency averages about one every 1-2 seconds, but the measurement methodology makes it impossible to determine the true frequency or spacing. The vast majority of the "large" samples are reported on cpu#0 but a handful are reported on other cpus. This might also be methodology-related but the load is running on 4 vcpus. On the same machine, when I run with nosmp, I see no large measurements. And when I run the load with 1 vcpu, I see a lower frequency (about one every ten seconds), but again this could be due to the measurement methodology. On my dual-core (no SMT) test machine, I see only a couple of large measurements, 120438 cycles on cpu#0 and 120528 on cpu#1. The same load is being run, though limited to 2 vcpus. Is a million cycles in an interrupt handler bad? Any idea what might be consuming this? The evidence might imply more cpus means longer interrupt, which bodes poorly for larger machines. I tried disabling the timer rendezvous code (not positive I was successful), but still got large measurements, and eventually the machine froze up (but not before I observed the stime skew climbing quickly to the millisecond-plus range). Is there a way to cap the number of physical cpus seen by Xen (other than nosmp to cap at one)? Dan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.