[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] million cycle interrupt



>From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] 
>Sent: 2009年4月14日 10:34
>
>> handler, why not take one more step to measure every handler
>
>Great idea!  I added a max_cycles field to irq_desc_t
>and check/update it at every interrupt in do_IRQ,
>then print max_cycles in "xm debug-key i",
>including a "max max".
>
>I'm not entirely sure how to interpret the output
>from dump_irqs() but the only two IRQ's with
>type==PCI-MSI have a "large"
>max (450044 cycles and 730972 cycles).  The third is
>Vec240, an IO-APIC-edge interrupt that maxes at 1047500
>cycles.   No other interrupt has a max exceeding 10000.
>
>Here's the relevant output.  How do I map this to
>something meaningful?

One immediate trick is to print handler address for problematic
vector, and then search its name in your dumpped Xen symbol
file.

Thanks,
Kevin

>
>(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010 
>max_cycles=450044 in-flight=0 domain-list=0:254(----),
>
>(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010 
>max_cycles=730972 in-flight=0 domain-list=0:255(----),

Above two look like owned by dom0.

>
>(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000 
>max_cycles=1047500 mapped, unbound


looks like PIT, but not quite sure.

>
>(XEN) max_max_cycles = 1047500
>
>
>
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>> Sent: Monday, April 13, 2009 5:19 PM
>> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> Subject: RE: [Xen-devel] million cycle interrupt
>>
>>
>> >From: Dan Magenheimer
>> >Sent: 2009年4月14日 5:15
>> >
>> >> You can instrument irq_enter() and irq_exit() to read TSC
>> >
>> >Rather than do this generically and ensure I get all the macros
>> >correct (e.g. per_cpu, nesting) I manually instrumented three
>> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one
>> >in smp_call_function().  ALL of them show an issue with max
>> >readings in the 300K-1M range... with smp_call_function showing
>> >the lowest max and the second in do_IRQ (the non-guest one)
>> >showing readings over 1M (and the guest one at about 800K).
>>
>> Since you already reach this step around calling actual action's
>> handler, why not take one more step to measure every handler
>> (serial, apic, vtd, ...)? You can first simply print which handlers
>> are registered or invoked on your platform. If only one handler
>> is experienced with abnormal high latency, it's possibly one
>> specific point. Or else you can suspect on some common code
>> shared by all handlers, or ... as Keir said, it could be SMM. :-)
>>
>> Thanks,
>> Kevin
>>
>> >
>> >Interestingly, I get no readings at all over 60K when I
>> >recompile with max_phys_cpus=4 (and with nosmp) on my
>> >quad-core-by-two-thread machine.  This is versus several
>> >readings over 60K nearly every second when max_phys_cpus=8.
>> >
>> >> Otherwise who knows, it could even be system management mode
>> >
>> >I suppose measuring irq_enter/exist pairs still don't rule
>> >this out.  But the "large" interrupts don't seem to happen
>> >(at least not nearly as frequently) with fewer physical
>> >processors enabled, so sys mgmt mode seems unlikely.
>> >
>> >Anyway, still a probable problem, still mostly a mystery
>> >as to what is actually happening.  And, repeat, this has
>> >nothing to do with tmem... I'm just observing it using
>> >tmem as a convenient measurement tool.
>> >
>> >> -----Original Message-----
>> >> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>> >> Sent: Monday, April 13, 2009 2:24 AM
>> >> To: Dan Magenheimer; Xen-Devel (E-mail)
>> >> Subject: Re: [Xen-devel] million cycle interrupt
>> >>
>> >>
>> >> On 12/04/2009 21:16, "Dan Magenheimer"
>> >> <dan.magenheimer@xxxxxxxxxx> wrote:
>> >>
>> >> > Is a million cycles in an interrupt handler bad?  Any idea what
>> >> > might be consuming this?  The evidence might imply more cpus
>> >> > means longer interrupt, which bodes poorly for larger machines.
>> >> > I tried disabling the timer rendezvous code (not positive I
>> >> > was successful), but still got large measurements, and
>> >> > eventually the machine froze up (but not before I observed
>> >> > the stime skew climbing quickly to the millisecond-plus
>> >> > range).
>> >>
>> >> You can instrument irq_enter() and irq_exit() to read TSC and
>> >> find out the
>> >> distribution of irq handling times for interruptions that Xen
>> >> knows about.
>> >> Otherwise who knows, it could even be system management
>> mode on that
>> >> particular box.
>> >>
>> >>  -- Keir
>> >>
>> >>
>> >>
>> >
>> >_______________________________________________
>> >Xen-devel mailing list
>> >Xen-devel@xxxxxxxxxxxxxxxxxxx
>> >http://lists.xensource.com/xen-devel
>> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.