[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Problems with MSI interrupts
Hello, I am currently investigating an issue with MSI allocation/deallocation which appears to be an MSI resource leak in Xen. This is XenServer 6.0 based on Xen 4.1.1, with no changesets I can see affecting the relevant Xen codepaths. The box in question is a Netscalar SDX box with 24 logical cores (2 Nehalem sockets , 6 cores , hyperthreading), 96GB RAM, with 4 dual-port Intel 10G ixgbe cards, (and two SSL 'Xcelerator' cards, but I have disabled these for debugging purposes). Each of the 8 NIC ports exports 40 virtual functions. There are 40 (identical) VMs which have 1 VF from each NIC passed through to them, giving each VM 8 VFs. Each VF itself uses 3 MSI-X interrupts. Therefore, for all VMs to be working correctly, there are 3irqs per VF for 8 VFs for 40 VMs = 960 MSI-X interrupts. The symptoms are: Reboot the VMs a couple of times, and eventually Xen says "(XEN) ../physdev.c:140: domXXX: can't create irq for msi!". After adding extra debugging, the call call to create_irq() was returning -ENOSPC. At the point at which create_irq() was failing, there were huge numbers of irqs listed with the debugkeys 'i' with a descriptor affinity mask of all cpus, which I believe is interfering with the calculations in __assign_irq_vector(). I suspected that this might be because of scheduling under load swapping VCPUs across PCPUs, resulting in the irq descriptor being written into all PCPU IDTs. As a result, I pinned each VM to a specific PCPU in the hope that this would go away. When starting each VM individually, the problem appears to go away. However, when starting all VMs at once, there are still some irqs with an affinity mask of all CPUs. Specifically, one case is this: (I added extra debugging to put irq_cfg->cpu_mask into the 'i' debugkeys) (XEN) IRQ: 845 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000010 vec:7e type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 55(----), (XEN) IRQ: 846 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:86 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 54(----), (XEN) IRQ: 847 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:96 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 53(----), (XEN) IRQ: 848 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:be type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 52(----), (XEN) IRQ: 849 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:c6 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 51(----), (XEN) IRQ: 850 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:ce type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 50(----), (XEN) IRQ: 851 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:b7 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 49(----), (XEN) IRQ: 852 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:cf type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 48(----), (XEN) IRQ: 853 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:d7 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 47(----), (XEN) IRQ: 854 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:d9 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 46(----), (XEN) IRQ: 855 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:22 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 45(----), (XEN) IRQ: 856 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:2a type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 44(----), (XEN) IRQ: 857 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000010 vec:3c type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 43(----), (XEN) IRQ: 858 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:4c type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 42(----), (XEN) IRQ: 859 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:54 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 41(----), (XEN) IRQ: 860 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:b5 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 40(----), (XEN) IRQ: 861 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:ae type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 39(----), (XEN) IRQ: 862 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:de type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 38(----), (XEN) IRQ: 863 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000010 vec:55 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 37(----), (XEN) IRQ: 864 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:9d type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 36(----), (XEN) IRQ: 865 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:46 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 35(----), (XEN) IRQ: 866 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:a6 type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 34(----), (XEN) IRQ: 867 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:5f type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 33(----), (XEN) IRQ: 868 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff cfg_aff:00000000,00000000,00000000,00000020 vec:7f type=PCI-MSI status=00000050 in-flight=0 domain-list=34: 32(----), Shows all irqs for dom34. The descriptors have full affinity, but the irq_cfg has a cpu_mask between processor 8 and 9. The domain dump for dom34 is (XEN) General information for domain 34: (XEN) refcnt=3 dying=0 nr_pages=131065 xenheap_pages=8 dirty_cpus={} max_pages=133376 (XEN) handle=97ef6eef-69c2-024c-1bbb-a150ca668691 vm_assist=00000000 (XEN) paging assistance: hap refcounts translate external (XEN) Rangesets belonging to domain 34: (XEN) I/O Ports { } (XEN) Interrupts { 32-55 } (XEN) I/O Memory { f9f00-f9f03, fa001-fa003, fa19c-fa19f, fa29d-fa29f, fa39c-fa39f, fa49d-fa49f, fa59c-fa59f, fa69d-fa69f, fa79c-fa79f, fa89d-fa89f, fa99c-fa99f, faa9d-faa9f, fab9c-fab9f, fac9d-fac9f, fad9c-fad9f, fae9d-fae9f } (XEN) Memory pages belonging to domain 34: (XEN) DomPage list too long to display (XEN) P2M entry stats: (XEN) L1: 1590 entries, 6512640 bytes (XEN) L2: 253 entries, 530579456 bytes (XEN) PoD entries=0 cachesize=0 superpages=0 (XEN) XenPage 00000000001146e1: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 00000000001146e0: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 00000000001146df: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 00000000001146de: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 00000000000bdc0e: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 0000000000114592: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000011458f: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000011458c: caf=c000000000000001, taf=7400000000000001 (XEN) VCPU information and callbacks for domain 34: (XEN) VCPU0: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={3} (XEN) paging assistance: hap, 4 levels (XEN) No periodic timer (XEN) VCPU1: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={3} cpu_affinity={3} (XEN) paging assistance: hap, 4 levels (XEN) No periodic timer Showing that this domain is actually pinned to pcpu 3. Am I mis-interpreting the information, or does this indicate that the scheduler (credit) is not obeying the cpu_affinity? The virtual functions seem to be passing network traffic correctly so I would assume that interrupts are getting where they are supposed to be going. Another question which may or may not be related. cpu_cfg has a vector and a cpu_mask. From this, I assume that the same interrupt must occupy the same IDT entry for every pcpu it might be received on. Is there an architectural reason why this should be the case, or is it just the way Xen is coded? (Also, it seems that <asm/irq.h> and <xen/irq.h> both define struct irq_cfg and while one is strictly an extension of the other, there appears to be no guards around them meaning that sizeof(irq_cfg) depends on which header file you include. I don't know if this is relevant or not, but it strikes me that code getting confused as to which they are using could be computing on junk if it is expecting the longer irq_cfg and actually getting the shorter irq_cfg.) -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |