Xen project Mailing List

[Xen-devel] Re: xen dependant on pcpu 0 ?

From: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>

Date: Wed, 13 Oct 2010 16:26:35 +0200

Cc: Ian <Ian.Campbell@xxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Wed, 13 Oct 2010 07:27:30 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

This code was changed in changeset "x86: protect MSI-X table and pending bit array from guest writes" 22182:68cc3c514a0a Besides ... returning a bogus address in this piece of code: if ( !dev->domain || !paging_mode_translate(dev->domain) ) { struct domain *d = dev->domain; if ( !d ) for_each_domain(d) if ( !paging_mode_translate(d) && (iomem_access_permitted(d, dev->msix_table.first, dev->msix_table.last) || iomem_access_permitted(d, dev->msix_pba.first, dev->msix_pba.last)) ) break; if ( d ) { /* XXX How to deal with existing mappings? */ printk("SEIK: err what am i doing here ?? d=%d \n",d->domain_id); } } On a freshly booted machine, d seems to be 0 ... that would mean the ( !d ) code path will never be followed since all devices will belong to dom0 at first ? -- Sander Wednesday, October 13, 2010, 3:36:41 PM, you wrote: > By messing a bit with printk's and debug settings a warn_on in the hypervisor > is being triggered when starting the videograbbing domU: > mapping kernel into physical memory > about to get started... > (XEN) [2010-10-13 13:30:44] Xen WARN at msi.c:636 > (XEN) [2010-10-13 13:30:44] ----[ Xen-4.1-unstable x86_64 debug=y Tainted: > C ]---- > (XEN) [2010-10-13 13:30:44] CPU: 2 > (XEN) [2010-10-13 13:30:44] RIP: e008:[<ffff82c48015d797>] > pci_enable_msi+0x48a/0x9d5 > (XEN) [2010-10-13 13:30:44] RFLAGS: 0000000000010216 CONTEXT: hypervisor > (XEN) [2010-10-13 13:30:44] rax: 0000000000000004 rbx: 00000000fe5fe000 > rcx: 0000000000000001 > (XEN) [2010-10-13 13:30:44] rdx: 0000000000000004 rsi: 0000000000000282 > rdi: ffff82c48024e940 > (XEN) [2010-10-13 13:30:44] rbp: ffff830237e57dc8 rsp: ffff830237e57cf8 > r8: 0000000000000009 > (XEN) [2010-10-13 13:30:44] r9: 000000000000003a r10: 0000000000000092 > r11: 0000000000000213 > (XEN) [2010-10-13 13:30:44] r12: 0000000000000000 r13: ffff830237e57ea8 > r14: ffff83020211ed10 > (XEN) [2010-10-13 13:30:44] r15: 0000000000000008 cr0: 000000008005003b > cr4: 00000000000006f0 > (XEN) [2010-10-13 13:30:44] cr3: 0000000225f0e000 cr2: ffff880004e93d68 > (XEN) [2010-10-13 13:30:44] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: > e010 cs: e008 > (XEN) [2010-10-13 13:30:44] Xen stack trace from rsp=ffff830237e57cf8: > (XEN) [2010-10-13 13:30:44] ffff830237e57d38 ffff82c480126b66 > ffff830237e57e18 0700000000000010 > (XEN) [2010-10-13 13:30:44] 0000000000001000 0000000000000030 > 00000000fe5ff000 00000000fe5ff000 > (XEN) [2010-10-13 13:30:44] 0000009000077d68 ffff83014601ad10 > 0000000700000246 0000000000000000 > (XEN) [2010-10-13 13:30:44] 0000000700000092 0000000000000000 > ffff83020211eda8 00000000000fe5ff > (XEN) [2010-10-13 13:30:44] 00000000000fe5ff ffff8301622fde28 > 0000000000000202 ffff830237e57da8 > (XEN) [2010-10-13 13:30:44] ffff82c480120680 ffff830237e57ea8 > 00000000ffffffed ffff830146a24000 > (XEN) [2010-10-13 13:30:44] 0000000000000057 0000000000000048 > ffff830237e57e48 ffff82c48015f16e > (XEN) [2010-10-13 13:30:44] 0000000025dfc910 000000000000015c > 0000000000000048 0000000000000120 > (XEN) [2010-10-13 13:30:44] ffff830237e82480 0000000000000282 > ffff83020211ed10 ffff830237e57e28 > (XEN) [2010-10-13 13:30:44] ffff82c480120680 ffff88002df4bb30 > 0000000000000057 ffff830146a24000 > (XEN) [2010-10-13 13:30:44] 0000000000000048 ffff830146a24190 > ffff830237e57ef8 ffff82c480172806 > (XEN) [2010-10-13 13:30:44] 0000000180196b1a ffff830237e5a020 > ffff830200000004 ffff830237e57ea8 > (XEN) [2010-10-13 13:30:44] 000000000000000b ffffffffffffffff > 0000000000000007 0000000000000000 > (XEN) [2010-10-13 13:30:44] 00000000fe5fe000 aaaaaaaaaaaaaaaa > 0000000000000007 0000000000000048 > (XEN) [2010-10-13 13:30:44] 00000000fe5fe000 0000000000000000 > 0000000000000246 ffff8300c7e88000 > (XEN) [2010-10-13 13:30:44] 000000000000000b ffff8800278c4400 > 0000000000000011 ffff88002ffea700 > (XEN) [2010-10-13 13:30:44] 00007cfdc81a80c7 ffff82c480202a82 > ffffffff8100942a 0000000000000021 > (XEN) [2010-10-13 13:30:44] ffff88002ffea700 0000000000000011 > ffff8800278c4400 000000000000000b > (XEN) [2010-10-13 13:30:44] ffff88002df4bbd0 00000000000006a1 > 0000000000000213 ffff88002fc20200 > (XEN) [2010-10-13 13:30:44] ffffffff810df6ea 0000000000000011 > 0000000000000021 ffffffff8100942a > (XEN) [2010-10-13 13:30:44] Xen call trace: > (XEN) [2010-10-13 13:30:44] [<ffff82c48015d797>] pci_enable_msi+0x48a/0x9d5 > (XEN) [2010-10-13 13:30:44] [<ffff82c48015f16e>] > map_domain_pirq+0x275/0x363 > (XEN) [2010-10-13 13:30:44] [<ffff82c480172806>] do_physdev_op+0x826/0x10b0 > (XEN) [2010-10-13 13:30:44] [<ffff82c480202a82>] syscall_enter+0xf2/0x14c > (XEN) [2010-10-13 13:30:44] > (XEN) [2010-10-13 13:30:44] SEIK bus: 7 slot: 0 func:0 msi->table_base: > fe5fe000 read_pci_mem_bar: 4 > (XEN) [2010-10-13 13:30:44] SEIK pba_paddr: 4 > it's this one: WARN_ON(msi->>table_base != read_pci_mem_bar(bus, slot, func, bir)); > I have added some printk's .. and read_pci_mem_bar seems to return a bogus > value .. the pba_addr is used later in the function, but i can't oversee if > and when this could have implications. > This also occurs when disabling the pci_resource_align on the kernel line. > lspci on dom0 shows: > 07:00.0 USB Controller: NEC Corporation Device 0194 (rev 03) (prog-if 30) > Subsystem: ASUSTeK Computer Inc. Device 8413 > Flags: bus master, fast devsel, latency 0, IRQ 33 > Memory at fe5fe000 (64-bit, non-prefetchable) [size=8K] > Capabilities: [50] Power Management version 3 > Capabilities: [70] Message Signalled Interrupts: Mask- 64bit+ > Queue=0/3 Enable- > Capabilities: [90] MSI-X: Enable+ Mask- TabSize=8 > Capabilities: [a0] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting <?> > Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff > Capabilities: [150] #18 > Kernel driver in use: pciback > In the same function it seems to trigger > if ( d ) > { > /* XXX How to deal with existing mappings? */ > } > Which seems to be a bit odd for a freshly booted system with no domU restarts > ? > grub menu.lst: > title xen-4.1-unstable.gz / Debian GNU/Linux, > 2.6.32.23-xen-next-2.6.32.x-generaldebug-20101002 > root (hd0,0) > kernel /xen-4.1-unstable.gz dom0_mem=768M loglvl=all > loglvl_guest=all com1=115200,8n1 sync_console console_to_ring > console_timestamps console=vga,com1 iommu=off debug lapic=debug > apic_verbosity=debug apic=debug noirqbalance > module /vmlinuz-2.6.32.24-xen-next-2.6.32.x-tracing-20101013 > root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255 > loop_max_part=63 libata.noacpi=1 debug loglevel=10 noirqbalance > irqbalance=off iommu=soft > xen-pciback.hide=(03:06.0)(07:00.0)(09:01.0)(09:01.1)(09:01.2) > pci=resource_alignment=03:06.0;07:00.0;09:01.0;09:01.1;09:01.2; > module /initrd.img-2.6.32.24-xen-next-2.6.32.x-tracing-20101013 > -- > Sander > Tuesday, October 12, 2010, 6:44:33 PM, you wrote: >> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote: >>> Hi Keir, >>> >>> Does xen and/or the xen console depend on physical cpu 0 ? >> Usually the console for Dom0, and I think all other domains go >> through CPU0. Let me CC Ian here, who has been mucking in this >> area and found some bugs (and produced fixes). >> Ian, that bug you found with not clearing the eventchannel - that >> wouldn't have an impact here, right? >>> >>> I'm still trying to solve the mystery of my machine freezing when doing: >>> >>> - videograbbing in a domU with a usb3 pci-express controller passed >>> through (seems to cause quite a few interrupts) >>> - compiling a linux kernel with "make -j 6" >>> >>> It's a 6 core AMD phenom x6. >>> >>> Without cpu pinning: >>> I can freeze the machine easily within a minute after starting the compile, >>> at first xen serial console also slows down under the load (slow updates). >>> When the machine freezes i can't do anything with xen serial console. >>> >>> With cpu pinning: >>> By not using the pcpu 0 at all for any domain, and pinning the domain with >>> the videograbber to it's own pcpu (pcpu 5) it seems the machine keeps >>> running after 20 "make -j6" iterations of kernel compilation. >>> Xen serial console stays responsive and doesn't slow down during the kernel >>> compilation. The videograbber shows no problem grabbing video. >>> >> AHA! So finally closer to the mystery. >> Can you provide the /proc/interrupts of the Dom0? >> I wonder if this is related to the isseu I had some time ago, and never got >> to look at. The problem was that during heavy compilation (this is a 2 >> Nehelem >> socket box, just running Dom0 - no guests), the keyboard and USB driver would >> stop getting interrupts. So the drivers would start polling which is quite >> slow, >> albeit servicable, and then at some point it would pick up again. >> The weirdness was that the /proc/interrupts showed absolutly _no_ interrupts >> on CPU0 >> during that time - as if Xen just forgot to update them. Jeremy suggested I >> try to >> disable Xen IRQ balance (noirqbalance on Xen command line) in case that is >> it, and to my >> emberrasement I haven't tried that yet. >> Did you try that? I think somebody suggested that but I can't recall whether >> it >> was for this issue? >>> >>> Name ID VCPU CPU State Time(s) CPU >>> Affinity >>> Domain-0 0 0 3 r-- 2169.7 1-4 >>> Domain-0 0 1 1 -b- 2339.3 1-4 >>> Domain-0 0 2 2 -b- 2358.9 1-4 >>> Domain-0 0 3 3 -b- 2298.2 1-4 >>> Domain-0 0 4 1 -b- 2221.9 1-4 >>> Domain-0 0 5 4 -b- 2287.7 1-4 >>> backup 9 0 4 -b- 10.6 1-4 >>> database 1 0 4 -b- 45.3 1-4 >>> davical 5 0 3 -b- 8.7 1-4 >>> git 8 0 2 -b- 7.9 1-4 >>> mail 2 0 4 -b- 8.0 1-4 >>> samba 3 0 3 -b- 11.1 1-4 >>> security 7 0 5 r-- 1433.2 5 >>> www 4 0 1 -b- 10.2 1-4 >>> zabbix 6 0 3 -b- 21.2 1-4 >>> >>> >>> Is there a way a deadlock could occur between hypervisor <-> dom0 <-> domU >>> especially related to passthrough/interrupts in the context of pcpu 0 ? >> I don't know, but I do know that the IRQ handling in Xen 4.0 changed >> significantly compared >> to 3.4. I don't remember if you ever ran this setup under 3.4? >>> >>> -- >>> Sander -- Best regards, Sander mailto:linux@xxxxxxxxxxxxxx _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.