[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Re: [Xen-devel] pci-passthrough in pvops causing offline raid
On Thu, Nov 11, 2010 at 12:58:09PM -0500, Konrad Rzeszutek Wilk wrote: > On Thu, Nov 11, 2010 at 05:38:50PM +0000, Mark Adams wrote: > > On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk wrote: > > > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote: > > > > Hi All, > > > > > > > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21. > > > > > > > > In a voip setup, where I have forwarded the onboard NIC interfaces > > > > through to domU using the following grub config: > > > > > > > > module /vmlinuz-2.6.32-5-xen-amd64 placeholder > > > > root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro quiet > > > > xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0) > > > > pci=resource_alignment=02:00.0;03:00.0 > > > > > > > > I'm having a serious issue where the raid card goes offline after an > > > > indefinate period of time. Sometimes runs fine for a week, other times 1 > > > > day before I get "offline device" errors. Rebooting the machine fixes it > > > > straight away, and everything is back online. > > > > > > > > What in the Xen pciback is causing the raid card to go offline? The > > > > only devices hidden are the 2 onboard NIC's. > > > > > > You need to give more details. Is the RAID card a 3Ware? An LSI? Do you > > > run with an IOMMU? When the RAID card goes offline, do you see a stop of > > > IRQs going to the device? Are the IRQs for the RAID card sent to all of > > > your > > > CPUs or just a specific one? Are you pinning your guests to specific CPUs? > > > Does the issue disappear if you don't passthrough the NIC interfaces? If > > > so have > > > you run this setup for "a week" to make sure? > > > > It is an Areca 1220. I can't see anything when the device goes offline > > apart from > > > > [77324.264270] sd 0:0:0:1: rejecting I/O to offline device > > [77334.005854] sd 0:0:0:0: rejecting I/O to offline device > > That is it? No other details from the driver? Did you poke at the driver > (modinfo) > to see if there are any options to increase its verbosity. I can't do anything once its happened, everything is offline so I have no utils... > > > > > Unfortunately nothing get's logged because there is nothing to write to > > anymore. I'm not sure how I can see the IRQs otherwise. There is no > > cat /proc/interrupts > > > pinning being done at all, and the machine was running for a few months > > OK before the pciback was added. > > Ok, what about your NICs? Are they on-board? Are they sharing the IRQ > with the card? You should be able to see this by looking at /proc/interrupts. > Which NICs are they? lspci can you help you there. As of matter of fact, run > lspci -vvv and send that. It is the onboard nics, they are Intel 82574L. I can see the arcmsr line, but not anything for the NICS (because they are hidden?) 39: 1126249 0 0 0 0 0 0 0 xen-pirq-ioapic-level arcmsr Nothing else is on 1126249 see lspci.txt attached. > > > > Is my kernel module line correct above? are the xen-pciback.permissive > > and resource_alignment options required? Also I am passing through the > > Not always. The resource_alignment only if the BARs (look at lspci output) are > not page-aligned. If you have no idea what I am talking about then the answer > is yes. > > > onboard NIC's - is this something that should be avoided or is it ok to > > do? > > It is fine. That is the first thing I test.. > > > > > > > > > > > I know that this issue is with Xen, as I had this running on a different > > > > server (same xen setup) and it had the same issues, which I initially > > > > thought were to do with the raid card. > > > > > > So you never ran this setup on this kernel (2.6.32-5) without the Xen > > > hypervisor? > > > > no, its always had the hypervisor - but it was running ok before the > > pciback options were added. This week, it's seemed to happen > > approximately every 24 hours. > > When this hang occurs, can you do 'xm debug-key Q', 'xm debug-key i', 'xm > debug-key z'. > Then run 'xm dmesg' and provide that to me? I can try this, but It probably won't work as the device is will not be readable. > > Is your boot disk on the same disk as the RAID? There are 2 raids, a Raid1 for the OS (/boot / /var /tmp /usr) and a raid5 for VM's - They both dissapear at the same time so it appears the card is dissapearing.. Attachment:
lspci.txt _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |