[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] million cycle interrupt



>From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] 
>Sent: 2009年4月14日 12:24
>
>The IO-APIC-edge interrupt action is timer_interrupt.

this could be either driven by PIT or HPET (in legacy replacement
mode), and activated when PIT is using as a platform timer or
when PIT/HPET is used to broadcast wakeup event to cpus in 
deep C-states. The logic in timer_interrupt is very simple, and 
maybe you could again go one more step tracing into that function,
e.g. the block protected by pit_lock. :-)

>
>The MSI interrupt action is showing up strange
>(2010001)... I'll need to look into this more.

It's reasonable as action is allocated by xmalloc for those devices
owned by guest. I guess you find two MSI instances in 
__do_IRQ_guest path? If yes, the logic to handle pending eoi may
be suspected. Then you may use "ioapic_ack=old" to verify.

From information by far, it looks like the weird high cost could be
from individual handler implementation, or else you should be able
to observe in every handler (such as serial).

>
>> lock contention?
>
>Possible, though the consistent max numbers make it
>seem unlikely.
>
>> If it's really caused by SMT, it's more likely to be software
>> caused resource contention due to doubled logical processors.
>
>You mentioned PIT earlier... I wonder if PIT can be
>read in parallel if read from different cores, but
>if read from different threads, it must be serialized?
>A million cycles is probably about right for a platform
>timer read, true?

It looks too large to me.

Thanks,
Kevin

>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>> Sent: Monday, April 13, 2009 9:05 PM
>> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> Subject: RE: [Xen-devel] million cycle interrupt
>>
>>
>> >From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>> >Sent: 2009年4月14日 10:59
>> >
>> >Some more interesting data that echos my earlier
>> >measurements:
>> >
>> >Booting with maxcpus=4, the three "large" interrupts
>> >are all about 20000 cycles.  Booting with maxcpus=6,
>> >the three are at 146896, 183260, and 240959 cycles.
>> >
>> >So it definitely appears to get worse as maxcpus
>> >gets larger.
>>
>> lock contention?
>>
>> >
>> >I also wonder if it might be related to SMT?
>>
>> If it's really caused by SMT, it's more likely to be software
>> caused resource contention due to doubled logical processors.
>> At hardware level although cache is shared, ISRs in Xen are
>> normally short and I don't think such high factor could come
>> from there...
>>
>> Thanks,
>> Kevin
>>
>> >
>> >> One immediate trick is to print handler address for problematic
>> >
>> >I'll give that a try, but probably not until tomorrow.
>> >
>> >> -----Original Message-----
>> >> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>> >> Sent: Monday, April 13, 2009 8:42 PM
>> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> >> Subject: RE: [Xen-devel] million cycle interrupt
>> >>
>> >>
>> >> >From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
>> >> >Sent: 2009年4月14日 10:34
>> >> >
>> >> >> handler, why not take one more step to measure every handler
>> >> >
>> >> >Great idea!  I added a max_cycles field to irq_desc_t
>> >> >and check/update it at every interrupt in do_IRQ,
>> >> >then print max_cycles in "xm debug-key i",
>> >> >including a "max max".
>> >> >
>> >> >I'm not entirely sure how to interpret the output
>> >> >from dump_irqs() but the only two IRQ's with
>> >> >type==PCI-MSI have a "large"
>> >> >max (450044 cycles and 730972 cycles).  The third is
>> >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500
>> >> >cycles.   No other interrupt has a max exceeding 10000.
>> >> >
>> >> >Here's the relevant output.  How do I map this to
>> >> >something meaningful?
>> >>
>> >> One immediate trick is to print handler address for problematic
>> >> vector, and then search its name in your dumpped Xen symbol
>> >> file.
>> >>
>> >> Thanks,
>> >> Kevin
>> >>
>> >> >
>> >> >(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010
>> >> >max_cycles=450044 in-flight=0 domain-list=0:254(----),
>> >> >
>> >> >(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010
>> >> >max_cycles=730972 in-flight=0 domain-list=0:255(----),
>> >>
>> >> Above two look like owned by dom0.
>> >>
>> >> >
>> >> >(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000
>> >> >max_cycles=1047500 mapped, unbound
>> >>
>> >>
>> >> looks like PIT, but not quite sure.
>> >>
>> >> >
>> >> >(XEN) max_max_cycles = 1047500
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>> >> >> Sent: Monday, April 13, 2009 5:19 PM
>> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> >> >> Subject: RE: [Xen-devel] million cycle interrupt
>> >> >>
>> >> >>
>> >> >> >From: Dan Magenheimer
>> >> >> >Sent: 2009年4月14日 5:15
>> >> >> >
>> >> >> >> You can instrument irq_enter() and irq_exit() to read TSC
>> >> >> >
>> >> >> >Rather than do this generically and ensure I get all 
>the macros
>> >> >> >correct (e.g. per_cpu, nesting) I manually instrumented three
>> >> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one
>> >> >> >in smp_call_function().  ALL of them show an issue with max
>> >> >> >readings in the 300K-1M range... with 
>smp_call_function showing
>> >> >> >the lowest max and the second in do_IRQ (the non-guest one)
>> >> >> >showing readings over 1M (and the guest one at about 800K).
>> >> >>
>> >> >> Since you already reach this step around calling 
>actual action's
>> >> >> handler, why not take one more step to measure every handler
>> >> >> (serial, apic, vtd, ...)? You can first simply print
>> >which handlers
>> >> >> are registered or invoked on your platform. If only one handler
>> >> >> is experienced with abnormal high latency, it's possibly one
>> >> >> specific point. Or else you can suspect on some common code
>> >> >> shared by all handlers, or ... as Keir said, it could
>> be SMM. :-)
>> >> >>
>> >> >> Thanks,
>> >> >> Kevin
>> >> >>
>> >> >> >
>> >> >> >Interestingly, I get no readings at all over 60K when I
>> >> >> >recompile with max_phys_cpus=4 (and with nosmp) on my
>> >> >> >quad-core-by-two-thread machine.  This is versus several
>> >> >> >readings over 60K nearly every second when max_phys_cpus=8.
>> >> >> >
>> >> >> >> Otherwise who knows, it could even be system management mode
>> >> >> >
>> >> >> >I suppose measuring irq_enter/exist pairs still don't rule
>> >> >> >this out.  But the "large" interrupts don't seem to happen
>> >> >> >(at least not nearly as frequently) with fewer physical
>> >> >> >processors enabled, so sys mgmt mode seems unlikely.
>> >> >> >
>> >> >> >Anyway, still a probable problem, still mostly a mystery
>> >> >> >as to what is actually happening.  And, repeat, this has
>> >> >> >nothing to do with tmem... I'm just observing it using
>> >> >> >tmem as a convenient measurement tool.
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>> >> >> >> Sent: Monday, April 13, 2009 2:24 AM
>> >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail)
>> >> >> >> Subject: Re: [Xen-devel] million cycle interrupt
>> >> >> >>
>> >> >> >>
>> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer"
>> >> >> >> <dan.magenheimer@xxxxxxxxxx> wrote:
>> >> >> >>
>> >> >> >> > Is a million cycles in an interrupt handler bad?  Any
>> >> idea what
>> >> >> >> > might be consuming this?  The evidence might imply
>> more cpus
>> >> >> >> > means longer interrupt, which bodes poorly for larger
>> >> machines.
>> >> >> >> > I tried disabling the timer rendezvous code (not 
>positive I
>> >> >> >> > was successful), but still got large measurements, and
>> >> >> >> > eventually the machine froze up (but not before I observed
>> >> >> >> > the stime skew climbing quickly to the millisecond-plus
>> >> >> >> > range).
>> >> >> >>
>> >> >> >> You can instrument irq_enter() and irq_exit() to 
>read TSC and
>> >> >> >> find out the
>> >> >> >> distribution of irq handling times for 
>interruptions that Xen
>> >> >> >> knows about.
>> >> >> >> Otherwise who knows, it could even be system management
>> >> >> mode on that
>> >> >> >> particular box.
>> >> >> >>
>> >> >> >>  -- Keir
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >_______________________________________________
>> >> >> >Xen-devel mailing list
>> >> >> >Xen-devel@xxxxxxxxxxxxxxxxxxx
>> >> >> >http://lists.xensource.com/xen-devel
>> >> >> >
>> >> >
>> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.