[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: pci passthrough xhci host controller
Hi Konrad, Since it all seemed to work on the intel machine, i returned to the AMD .. And with the same hypervisor, dom0 and domU versions, i did experience a freeze already (within 2 hours after fresh boot), with the same boot options. Currently i'm testing the suggestions you made below, i hope i can give some news in about 2 days ... Thanks again ! Monday, September 27, 2010, 5:59:52 PM, you wrote: > On Tue, Sep 21, 2010 at 10:03:10PM +0200, Sander Eikelenboom wrote: >> Hi Konrad, >> >> I indeed have the feeling the memleak's aren't huge, and adding the diverse >> kernel hacking debug options, ended op doing more wrong than right. >> I have turned off the options i added, re-instated the "swiotlb=force" in >> the domU config to see if it goes from a working to a freezing config, but i >> have the feeling it will not make a difference. >> >> Then i have 4 differences left: >> >> - Other dom0 kernel since the tests resulting in continous freezes of my >> server >> - Other domU kernel since the tests resulting in continous freezes of my >> server >> - Other workload (server is running more VM's) >> - Other physical hardware >> - server is AMD phenom X6, current config Intel quad core >> - Both have there iommu disabled >> - Both are 64 capable cpu's with 64 xen, dom0 and domU >> >> - But most notably perhaps, the intel has only 2GB RAM, the server >> 8GB >> >> Could the available physical RAM be an issue here ? >> I limit the ram for dom0 with dom0_mem= > OK, but that would not limit the memory of where the guest get their memory. > I think > you might need this in conjunction with maxmem, say: maxmem=4GB > dom0_mem=max:512MB > This way your 8GB machine has 4GB of memory available for both dom0 and the > guest. I have used mem=4G and dom0_mem=768M, this does limit the available ram to less than 4G and makes dom0 768M. I also used "noirqbalance" for xen, and used the suggestion of Pasi: libata.noacpi=1 booted both dom0 and domU with "iommu=soft" only, no swiotlb=force specified. >> >> After this test succeeds on the intel machine, i will retry the samen >> xen,dom0 kernel and domU kernel on the AMD config. >> Is there anything i can especially log/configure/debug to get more detail to >> see if the 8GB could be the problem ? > I think we have concluded that the device in question (3.0 PCIe USB host > controller) can do > 64-bit DMA. In which case the SWIOTLB is only used as an address translation > system (pfn ->> mfn, and vice-versa). If it was 32-bit it would also be utilized for bouncing > the DMA buffers - there are sometimes cases were the driver does not sync > after the bounce > (perfect examples are the existing radeon/nouveau drivers) ending up with > corruption/hanged > device. But those show up early in development, and this is the new USB > controller than > can do 64-bit instead of the dreaded 32-bit limit that all other USB > controllers are stuck > with it. > The memory difference might be a red-herring. It could be the workload - more > VMs > and a latency issue (say we are waiting for an IRQ and it comes just a bit > too late)? > I think the idea of narrowing down on the AMD machine the amount of memory > could help. > What is the exact model of your USB capture device and the USB PCI device? Is there a way to detect if it's doing 32 bit or 64bit DMA ? Although latency could be an issue when the xhci driver/hardware would be more sensitive to that, or it would enter paths in the driver that haven't had much testing, the latency issues shouldn't be much different from USB2. The capture device is the same, and using the same driver and bandwidth in either case ... That said .. it could be a corner case in the driver, that in combination with more than 4G ram could do something wrong perhaps, (and perhaps than only in combination with xen) When i look at /proc/buddyinfo in the dom0, i only see the figures on the line DMA32 changing (allocating and freeing) Node 0, zone DMA 7 13 6 5 7 1 2 3 3 1 1 Node 0, zone DMA32 354 823 585 149 60 19 13 0 0 1 0 Node 0, zone Normal 15 3 9 4 4 3 3 1 1 1 0 In the domU, i don't have the "normal" line, and i only see changes on the DMA32 line Node 0, zone DMA 6 0 0 1 1 1 1 1 0 1 0 Node 0, zone DMA32 552 151 119 33 1 0 0 0 0 0 0 And it is using MSI, /proc/interrupts on domU shows (i don't see the normal IRQ the devices has(33) listed here ?): 44: 0 xen-pirq-pcifront ohci_hcd:usb2 45: 20810 xen-pirq-pcifront ohci_hcd:usb3 46: 2 xen-pirq-pcifront ehci_hcd:usb1 86: 0 xen-pirq-pcifront-msi-x xhci_hcd 87: 72858256 xen-pirq-pcifront-msi-x xhci_hcd 244: 12674 xen-dyn-event eth0 245: 154352 xen-dyn-event blkif 246: 7518 xen-dyn-event blkif 247: 31 xen-dyn-event blkif 248: 2189 xen-dyn-event hvc_console 249: 41 xen-dyn-event pcifront 250: 593 xen-dyn-event xenbus 251: 0 xen-percpu-ipi callfuncsingle0 252: 0 xen-percpu-virq debug0 253: 0 xen-percpu-ipi callfunc0 254: 0 xen-percpu-ipi resched0 255: 4849041 xen-percpu-virq timer0 NMI: 0 Non-maskable interrupts LOC: 0 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts PND: 0 Performance pending work RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns MCE: 0 Machine check exceptions MCP: 0 Machine check polls ERR: 0 MIS: 0 And /proc/interrupts on dom0: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 1: 2 0 0 0 0 0 xen-pirq-ioapic-edge i8042 8: 0 0 0 0 0 0 xen-pirq-ioapic-edge rtc0 9: 0 0 0 0 0 0 xen-pirq-ioapic-edge acpi 12: 4 0 0 0 0 0 xen-pirq-ioapic-edge i8042 17: 12 0 0 0 0 0 xen-pirq-ioapic-level ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3 18: 4 0 0 0 0 0 xen-pirq-ioapic-level ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7 25: 18 0 0 0 0 0 xen-pirq-ioapic-level HDA Intel 33: 0 0 0 0 0 0 xen-pirq-ioapic-level pciback[0000:07:00.0] 44: 0 0 0 0 0 0 xen-pirq-ioapic-level pciback[0000:09:01.0] 45: 21068 0 0 0 0 0 xen-pirq-ioapic-level pciback[0000:09:01.1] 46: 2 0 0 0 0 0 xen-pirq-ioapic-level pciback[0000:09:01.2] 1700: 215 0 0 0 0 0 xen-dyn-event vif8.0 1701: 974 0 0 0 0 0 xen-dyn-event blkif-backend 1702: 19 0 0 0 0 0 xen-dyn-event blkif-backend 1703: 214484 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1704: 434 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1705: 215 0 0 0 0 0 xen-dyn-event vif7.0 1706: 1058 0 0 0 0 0 xen-dyn-event blkif-backend 1707: 19 0 0 0 0 0 xen-dyn-event blkif-backend 1708: 1188 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1709: 416 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1712: 4816 0 0 0 0 0 xen-dyn-event vif6.0 1713: 3203 0 0 0 0 0 xen-dyn-event blkif-backend 1714: 5055 0 0 0 0 0 xen-dyn-event blkif-backend 1715: 25 0 0 0 0 0 xen-dyn-event blkif-backend 1716: 1365 0 0 0 0 0 xen-dyn-event pciback 1717: 434518 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1718: 558 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1719: 299487 0 0 0 0 0 xen-dyn-event vif5.0 1720: 4529 0 0 0 0 0 xen-dyn-event blkif-backend 1721: 25 0 0 0 0 0 xen-dyn-event blkif-backend 1722: 342 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1723: 321 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1724: 1448 0 0 0 0 0 xen-dyn-event blkif-backend 1725: 23 0 0 0 0 0 xen-dyn-event blkif-backend 1726: 1112 0 0 0 0 0 xen-dyn-event vif4.0 1727: 14889 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1728: 391 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1729: 473 0 0 0 0 0 xen-dyn-event vif3.0 1730: 2759 0 0 0 0 0 xen-dyn-event blkif-backend 1731: 19 0 0 0 0 0 xen-dyn-event blkif-backend 1732: 1051 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1733: 401 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1734: 80 0 0 0 0 0 xen-dyn-event vif2.0 1735: 958 0 0 0 0 0 xen-dyn-event blkif-backend 1736: 19 0 0 0 0 0 xen-dyn-event blkif-backend 1737: 1023 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1738: 509 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1739: 158347 0 0 0 0 0 xen-dyn-event vif1.0 1740: 17860 0 0 0 0 0 xen-dyn-event blkif-backend 1741: 19 0 0 0 0 0 xen-dyn-event blkif-backend 1742: 1009 0 0 0 0 0 xen-dyn-event evtchn:xenconsoled 1743: 365 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1744: 0 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1745: 12740 0 0 0 0 0 xen-dyn-event evtchn:xenstored 1746: 143704 0 0 0 0 0 xen-pirq-msi eth1 1747: 129018 0 0 0 0 0 xen-pirq-msi eth0 1748: 0 0 0 0 0 0 xen-pirq-msi ahci 1749: 158608 0 0 0 0 0 xen-pirq-msi ahci 1760: 0 0 0 0 0 0 xen-percpu-virq pcpu 1761: 10069 0 0 0 0 0 xen-dyn-event xenbus 1762: 0 0 0 0 0 10534 xen-percpu-ipi callfuncsingle5 1763: 0 0 0 0 0 0 xen-percpu-virq debug5 1764: 0 0 0 0 0 107 xen-percpu-ipi callfunc5 1765: 0 0 0 0 0 179072 xen-percpu-ipi resched5 1766: 0 0 0 0 0 3819179 xen-percpu-virq timer5 1767: 0 0 0 0 20758 0 xen-percpu-ipi callfuncsingle4 1768: 0 0 0 0 0 0 xen-percpu-virq debug4 1769: 0 0 0 0 165 0 xen-percpu-ipi callfunc4 1770: 0 0 0 0 176246 0 xen-percpu-ipi resched4 1771: 0 0 0 0 10775783 0 xen-percpu-virq timer4 1772: 0 0 0 8431 0 0 xen-percpu-ipi callfuncsingle3 1773: 0 0 0 0 0 0 xen-percpu-virq debug3 1774: 0 0 0 120 0 0 xen-percpu-ipi callfunc3 1775: 0 0 0 219617 0 0 xen-percpu-ipi resched3 1776: 0 0 0 3821742 0 0 xen-percpu-virq timer3 1777: 0 0 11293 0 0 0 xen-percpu-ipi callfuncsingle2 1778: 0 0 0 0 0 0 xen-percpu-virq debug2 1779: 0 0 207 0 0 0 xen-percpu-ipi callfunc2 1780: 0 0 239804 0 0 0 xen-percpu-ipi resched2 1781: 0 0 4937213 0 0 0 xen-percpu-virq timer2 1782: 0 34348 0 0 0 0 xen-percpu-ipi callfuncsingle1 1783: 0 0 0 0 0 0 xen-percpu-virq debug1 1784: 0 176 0 0 0 0 xen-percpu-ipi callfunc1 1785: 0 220234 0 0 0 0 xen-percpu-ipi resched1 1786: 0 10874047 0 0 0 0 xen-percpu-virq timer1 1787: 6367 0 0 0 0 0 xen-percpu-ipi callfuncsingle0 1788: 0 0 0 0 0 0 xen-percpu-virq debug0 1789: 38 0 0 0 0 0 xen-percpu-ipi callfunc0 1790: 178784 0 0 0 0 0 xen-percpu-ipi resched0 1791: 10963806 0 0 0 0 0 xen-percpu-virq timer0 NMI: 0 0 0 0 0 0 Non-maskable interrupts LOC: 0 0 0 0 0 0 Local timer interrupts SPU: 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 Performance monitoring interrupts PND: 0 0 0 0 0 0 Performance pending work RES: 178784 220234 239804 219617 176246 179072 Rescheduling interrupts CAL: 6405 34524 11500 8551 20923 10641 Function call interrupts TLB: 0 0 0 0 0 0 TLB shootdowns TRM: 0 0 0 0 0 0 Thermal event interrupts MCE: 0 0 0 0 0 0 Machine check exceptions MCP: 37 37 37 37 37 37 Machine check polls ERR: 0 MIS: 0 I have tried 2 different USB 3 controllers, both previously caused the freezes, both have a NEC chip. The USB controller in the AMD system is a ASUS U3S6, from which i only passthrough the USB3 controller and not the S-ATA controller. The other controller is an MSI, which only does USB3. lspci (domU): 07:00.0 USB Controller [0c03]: NEC Corporation Device [1033:0194] (rev 03) (prog-if 30) Subsystem: ASUSTeK Computer Inc. Device [1043:8413] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 33 Region 0: Memory at fe500000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=00001000 PBA: BAR=0 offset=00001080 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff Capabilities: [150] #18 Kernel driver in use: xhci_hcd The capture device is a Kworld k2800, which has a em28xx chip, and it's a USB 2 device. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |