Xen project Mailing List

RE: [Xen-devel] million cycle interrupt

To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Xen-Devel (E-mail)" <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Tue, 14 Apr 2009 02:34:05 +0000 (GMT)

Cc:

Delivery-date: Mon, 13 Apr 2009 19:35:02 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> handler, why not take one more step to measure every handler Great idea! I added a max_cycles field to irq_desc_t and check/update it at every interrupt in do_IRQ, then print max_cycles in "xm debug-key i", including a "max max". I'm not entirely sure how to interpret the output from dump_irqs() but the only two IRQ's with type==PCI-MSI have a "large" max (450044 cycles and 730972 cycles). The third is Vec240, an IO-APIC-edge interrupt that maxes at 1047500 cycles. No other interrupt has a max exceeding 10000. Here's the relevant output. How do I map this to something meaningful? (XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 max_cycles=450044 in-flight=0 domain-list=0:254(----), (XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 max_cycles=730972 in-flight=0 domain-list=0:255(----), (XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 max_cycles=1047500 mapped, unbound (XEN) max_max_cycles = 1047500 > -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx] > Sent: Monday, April 13, 2009 5:19 PM > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >From: Dan Magenheimer > >Sent: 2009年4月14日 5:15 > > > >> You can instrument irq_enter() and irq_exit() to read TSC > > > >Rather than do this generically and ensure I get all the macros > >correct (e.g. per_cpu, nesting) I manually instrumented three > >likely suspect irq_enter/exit pairs, two in do_IRQ() and one > >in smp_call_function(). ALL of them show an issue with max > >readings in the 300K-1M range... with smp_call_function showing > >the lowest max and the second in do_IRQ (the non-guest one) > >showing readings over 1M (and the guest one at about 800K). > > Since you already reach this step around calling actual action's > handler, why not take one more step to measure every handler > (serial, apic, vtd, ...)? You can first simply print which handlers > are registered or invoked on your platform. If only one handler > is experienced with abnormal high latency, it's possibly one > specific point. Or else you can suspect on some common code > shared by all handlers, or ... as Keir said, it could be SMM. :-) > > Thanks, > Kevin > > > > >Interestingly, I get no readings at all over 60K when I > >recompile with max_phys_cpus=4 (and with nosmp) on my > >quad-core-by-two-thread machine. This is versus several > >readings over 60K nearly every second when max_phys_cpus=8. > > > >> Otherwise who knows, it could even be system management mode > > > >I suppose measuring irq_enter/exist pairs still don't rule > >this out. But the "large" interrupts don't seem to happen > >(at least not nearly as frequently) with fewer physical > >processors enabled, so sys mgmt mode seems unlikely. > > > >Anyway, still a probable problem, still mostly a mystery > >as to what is actually happening. And, repeat, this has > >nothing to do with tmem... I'm just observing it using > >tmem as a convenient measurement tool. > > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] > >> Sent: Monday, April 13, 2009 2:24 AM > >> To: Dan Magenheimer; Xen-Devel (E-mail) > >> Subject: Re: [Xen-devel] million cycle interrupt > >> > >> > >> On 12/04/2009 21:16, "Dan Magenheimer" > >> <dan.magenheimer@xxxxxxxxxx> wrote: > >> > >> > Is a million cycles in an interrupt handler bad? Any idea what > >> > might be consuming this? The evidence might imply more cpus > >> > means longer interrupt, which bodes poorly for larger machines. > >> > I tried disabling the timer rendezvous code (not positive I > >> > was successful), but still got large measurements, and > >> > eventually the machine froze up (but not before I observed > >> > the stime skew climbing quickly to the millisecond-plus > >> > range). > >> > >> You can instrument irq_enter() and irq_exit() to read TSC and > >> find out the > >> distribution of irq handling times for interruptions that Xen > >> knows about. > >> Otherwise who knows, it could even be system management > mode on that > >> particular box. > >> > >> -- Keir > >> > >> > >> > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@xxxxxxxxxxxxxxxxxxx > >http://lists.xensource.com/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.