[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] megasas stops I/O when running kernel as dom0 under xen4.1/4.2
On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote: > On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote: >> On 24/08/11 13:06, Andrew Cooper wrote: >>> On 22/08/11 10:05, Andrew Cooper wrote: >>>> On 19/08/11 19:10, Andreas Olsowski wrote: >>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper: >>>>> >>>>>> The only change you need to make is in megasas_probe_one() in >>>>>> megaraid_sas_base.c >>>>>> >>>>>> Add a call to pci_enable_msi(pdev) immediately after the current >>>>> call to >>>>>> pci_set_master(pdev); >>>>>> >>>>>> ~Andrew >>>>>> >>>>> Yep, that works fine. Removed the module option as well. >>>>> >>>>> root@tarballerina:~# cat /proc/interrupts |grep mega >>>>> 2236: 69010 0 0 0 0 >>>>> 0 0 0 xen-pirq-msi megasas >>>>> >>>>> The same procedure that would have lead to almost instant errors has >>>>> not brought them to appear again. >>>>> >>>> Good. This is what we are seeing as well. I am still awaiting a reply >>>> from LSI on this topic. >>>> >>>> Unfortunately, this does point to a regression in the way Xen deals with >>>> legacy interrupts. >>> Out of interest, on all 3 of your boxes with the megaraid_sas cards, >>> could you gather the io_apic information? >>> >>> It is the z xen debug key on the serial console (or alternatively put >>> apic_verbosity=debug on the xen commandline and the information gets >>> dumped into the dmesg) >> You can ignore this - it is not relevant. >> >> I have narrowed the problem to a bug in the interrupt migration code. > Goodies! >> The bug occurs when the move pending flag is set, and somehow another >> interrupt comes in on the old pcpu without triggering the move >> completion code. This leaves the IO_APIC with ack'd but not EOI'd >> interrupt from the megaraid_sas device. > Ah, so the interrupt is delievered to Dom0 on the old per_cpu > event which is ignored. Ignored b/c we have rebinded the event channel > to the other CPU, right? The interrupt is not ignored - it seems to be being serviced by the device driver in dom0. I will admit that my debugging code may be a bit flaky - I started by trying to match IRQ35 (which is always claimed by PCI INTA on this server - very useful for debugging) between do_IRQ and its related PHYSDEVOP_eoi. I am currently trying to track the exact order of events around this interrupt which misses the move completion code. > Is there any code in the Hypervisor to turn off interrupt migration code? Not that I have found, although playing around with vcpu and task pinning should work. My debugging shows that Xen-4.1.1 is migrating this interrupt between PCPUs on average once every 4 real interrupts when dom0 is under any load whatsoever. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |