[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic



Hello again,

ok, I was happy too soon. Crashed again. Now I've set the following xen parameters:
  iommu=no-intremap dom0_max_vcpus=1-1 dom0_vcpus_pin noirqbalance

Best regards
  Thimo

Here the crash dump:

(XEN) **Pending EOI error^M
(XEN)   irq 29, vector 0x21^M
(XEN)   s[0] irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR 00000000^M
(XEN) All LAPIC state:^M
(XEN) [vector]      ISR      TMR      IRR^M
(XEN) [1f:00] 00000000 00000000 00000000^M
(XEN) [3f:20] 00020002 00000000 00000000^M
(XEN) [5f:40] 00000000 00000000 00000000^M
(XEN) [7f:60] 00000000 00000002 00000000^M
(XEN) [9f:80] 00000000 00000000 00000000^M
(XEN) [bf:a0] 00000000 01010000 00000000^M
(XEN) [df:c0] 00000000 01000000 00000000^M
(XEN) [ff:e0] 00000000 00000000 08000000^M
(XEN) Peoi stack trace records:^M
(XEN)   Pushed {sp 0, irq 30, vec 0x31}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 31, vec 0x71}^M
(XEN)   Marked {sp 0, irq 31, vec 0x71} ready^M
(XEN)   Pushed {sp 0, irq 31, vec 0x71}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x31}^M
(XEN)   Marked {sp 0, irq 30, vec 0x31} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x31}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN) Guest interrupt information:^M
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound^M
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(----),^M
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC          status=00000000 mapped, unbound^M
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge    status=00000006 mapped, unbound^M
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  5(----),^M
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(----),^M
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0:  9(----),^M
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  16 affinity:4 vec:b0 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 16(----),^M
(XEN)    IRQ:  18 affinity:8 vec:b8 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 18(----),^M
(XEN)    IRQ:  19 affinity:f vec:29 type=IO-APIC-level   status=00000002 mapped, unbound^M
(XEN)    IRQ:  20 affinity:f vec:39 type=IO-APIC-level   status=00000002 mapped, unbound^M
(XEN)    IRQ:  22 affinity:8 vec:61 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 22(----),^M
(XEN)    IRQ:  23 affinity:4 vec:d8 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 23(----),^M
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI         status=00000000 mapped, unbound^M
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI         status=00000000 mapped, unbound^M
(XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI         status=00000002 mapped, unbound^M
(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI         status=00000002 mapped, unbound^M
(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI         status=00000002 mapped, unbound^M
(XEN)    IRQ:  29 affinity:4 vec:21 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:276(----),^M
(XEN)    IRQ:  30 affinity:4 vec:31 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:275(----),^M
(XEN)    IRQ:  31 affinity:8 vec:71 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:274(----),^M
(XEN)    IRQ:  32 affinity:4 vec:49 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(----),^M
(XEN)    IRQ:  33 affinity:8 vec:51 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:272(----),^M
(XEN)    IRQ:  34 affinity:1 vec:59 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:271(----),^M
(XEN) IO-APIC interrupt information:^M
(XEN)     IRQ  0 Vec240:^M
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  1 Vec 56:^M
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  3 Vec 64:^M
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  4 Vec 72:^M
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  5 Vec 80:^M
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  6 Vec 88:^M
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  7 Vec 96:^M
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  8 Vec104:^M
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  9 Vec112:^M
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=L mask=0 dest_id:1^M
(XEN)     IRQ 10 Vec120:^M
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 11 Vec136:^M
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 12 Vec144:^M
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 13 Vec152:^M
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 14 Vec160:^M
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 15 Vec168:^M
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 16 Vec176:^M
(XEN)       Apic 0x00, Pin 16: vec=b0 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:4^M
(XEN)     IRQ 18 Vec184:^M
(XEN)       Apic 0x00, Pin 18: vec=b8 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:8^M
(XEN)     IRQ 19 Vec 41:^M
(XEN)       Apic 0x00, Pin 19: vec=29 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:15^M
(XEN)     IRQ 20 Vec 57:^M
(XEN)       Apic 0x00, Pin 20: vec=39 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:15^M
(XEN)     IRQ 22 Vec 97:^M
(XEN)       Apic 0x00, Pin 22: vec=61 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:8^M
(XEN)     IRQ 23 Vec216:^M
(XEN)       Apic 0x00, Pin 23: vec=d8 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:4^M
(XEN) number of MP IRQ sources: 15.^M
(XEN) number of IO-APIC #2 registers: 24.^M
(XEN) testing the IO APIC.......................^M
(XEN) IO APIC #2......^M
(XEN) .... register #00: 02000000^M
(XEN) .......    : physical APIC id: 02^M
(XEN) .......    : Delivery Type: 0^M
(XEN) .......    : LTS          : 0^M
(XEN) .... register #01: 00170020^M
(XEN) .......     : max redirection entries: 0017^M
(XEN) .......     : PRQ implemented: 0^M
(XEN) .......     : IO APIC version: 0020^M
(XEN) .... IRQ redirection table:^M
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   ^M
(XEN)  00 000 00  1    0    0   0   0    0    0    00^M
(XEN)  01 001 01  0    0    0   0   0    1    1    38^M
(XEN)  02 001 01  0    0    0   0   0    1    1    F0^M
(XEN)  03 001 01  0    0    0   0   0    1    1    40^M
(XEN)  04 001 01  0    0    0   0   0    1    1    48^M
(XEN)  05 001 01  0    0    0   0   0    1    1    50^M
(XEN)  06 001 01  0    0    0   0   0    1    1    58^M
(XEN)  07 001 01  0    0    0   0   0    1    1    60^M
(XEN)  08 001 01  0    0    0   0   0    1    1    68^M
(XEN)  09 001 01  0    1    0   0   0    1    1    70^M
(XEN)  0a 001 01  0    0    0   0   0    1    1    78^M
(XEN)  0b 001 01  0    0    0   0   0    1    1    88^M
(XEN)  0c 001 01  0    0    0   0   0    1    1    90^M
(XEN)  0d 001 01  0    0    0   0   0    1    1    98^M
(XEN)  0e 001 01  0    0    0   0   0    1    1    A0^M
(XEN)  0f 001 01  0    0    0   0   0    1    1    A8^M
(XEN)  10 004 04  0    1    1   1   1    1    1    B0^M
(XEN)  11 000 00  1    0    0   0   0    0    0    00^M
(XEN)  12 008 08  0    1    0   1   0    1    1    B8^M
(XEN)  13 00F 0F  1    1    0   1   0    1    1    29^M
(XEN)  14 00F 0F  1    1    0   1   0    1    1    39^M
(XEN)  15 07A 0A  1    0    0   0   0    0    2    B4^M
(XEN)  16 008 08  0    1    0   1   0    1    1    61^M
(XEN)  17 004 04  0    1    0   1   0    1    1    D8^M
(XEN) Using vector-based indexing^M
(XEN) IRQ to pin mappings:^M
(XEN) IRQ240 -> 0:2^M
(XEN) IRQ56 -> 0:1^M
(XEN) IRQ64 -> 0:3^M
(XEN) IRQ72 -> 0:4^M
(XEN) IRQ80 -> 0:5^M
(XEN) IRQ88 -> 0:6^M
(XEN) IRQ96 -> 0:7^M
(XEN) IRQ104 -> 0:8^M
(XEN) IRQ112 -> 0:9^M
(XEN) IRQ120 -> 0:10^M
(XEN) IRQ136 -> 0:11^M
(XEN) IRQ144 -> 0:12^M
(XEN) IRQ152 -> 0:13^M
(XEN) IRQ160 -> 0:14^M
(XEN) IRQ168 -> 0:15^M
(XEN) IRQ176 -> 0:16^M
(XEN) IRQ184 -> 0:18^M
(XEN) IRQ41 -> 0:19^M
(XEN) IRQ57 -> 0:20^M
(XEN) IRQ97 -> 0:22^M
(XEN) IRQ216 -> 0:23^M
(XEN) .................................... done.^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) Panic on CPU 3:^M
(XEN) CA-107844****************************************^M
(XEN) ^M
(XEN) Reboot in five seconds...^M
(XEN) Executing crash image^M


Am 19.08.2013 17:14, schrieb Thimo E.:
Hello,

after one week of testing an intermediate result:

Since I've set iommu=no-intremap no crash occured so far. The server never ran longer without a crash. So a careful "it's working", but, because only one 7 days passed so far, not a final horray.

Even if this option really avoids the problem I classify it as nothing more than a workaround...obviously a good one because it's working, but still a workaround.

Where could the problem of the source be ? Bug in hardware ? Bug in software ?

And what does interrupt remapping really do ? Does disabling remapping have a performance impact ?

Best regards
  Thimo

Am 12.08.2013 14:04, schrieb Andrew Cooper:
On 12/08/13 12:52, Thimo E wrote:
Hello Yang,

attached you'll find the kernel dmesg, xen dmesg, lspci and output of /proc/interrupts. If you want to see further logfiles, please let me know.

The processor is a Core i5-4670. The board is an Intel  DH87MC Mainboard. I am really not sure if it supports APICv, but VT-d is supported enabled enabled.


4.       The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

I don't see the IRQ29 in /proc/interrupts, what I see is:
cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
cat dmesg.txt | grep "eth0": [   23.152355] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
                                                  [   23.330408] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection

So is the ethernet irq the bad one ? That is an Onboard Intel network adapter.

That would be consistent with the crash seen with our hardware in XenServer


6.       I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?

 

Just to be sure, your proposal is to try the parameter "no-intremap" ?

specifically, iommu=no-intremap


Best regards
  Thimo

~Andrew


Am 12.08.2013 10:49, schrieb Zhang, Yang Z:

Hi Thimo,

From your previous experience and log, it shows:

1.       The interrupt that triggers the issue is a MSI.

2.       MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).

3.       The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.

4.       The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

5.       Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.

|6.       I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?

Also, please provide the whole Xen log.

 

Best regards,

Yang





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.