[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pci-passthrough in pvops causing offline raid



On Thu, Nov 11, 2010 at 05:38:50PM +0000, Mark Adams wrote:
> On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
> > > Hi All,
> > > 
> > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> > > 
> > > In a voip setup, where I have forwarded the onboard NIC interfaces
> > > through to domU using the following grub config:
> > > 
> > > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder 
> > > root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet 
> > > xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0) 
> > > pci=resource_alignment=02:00.0;03:00.0
> > > 
> > > I'm having a serious issue where the raid card goes offline after an
> > > indefinate period of time. Sometimes runs fine for a week, other times 1
> > > day before I get "offline device" errors. Rebooting the machine fixes it
> > > straight away, and everything is back online.
> > > 
> > > What in the Xen pciback is causing the raid card to go offline? The
> > > only devices hidden are the 2 onboard NIC's.
> > 
> > You need to give more details. Is the RAID card a 3Ware? An LSI? Do you
> > run with an IOMMU? When the RAID card goes offline, do you see a stop of
> > IRQs going to the device? Are the IRQs for the RAID card sent to all of your
> > CPUs or just a specific one? Are you pinning your guests to specific CPUs?
> > Does the issue disappear if you don't passthrough the NIC interfaces? If so 
> > have
> > you run this setup for "a week" to make sure?
> 
> It is an Areca 1220. I can't see anything when the device goes offline
> apart from 
> 
>     [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
>     [77334.005854] sd 0:0:0:0: rejecting I/O to offline device

That is it? No other details from the driver? Did you poke at the driver 
(modinfo)
to see if there are any options to increase its verbosity.

> 
> Unfortunately nothing get's logged because there is nothing to write to
> anymore. I'm not sure how I can see the IRQs otherwise. There is no

cat /proc/interrupts

> pinning being done at all, and the machine was running for a few months
> OK before the pciback was added.

Ok, what about your NICs? Are they on-board? Are they sharing the IRQ
with the card? You should be able to see this by looking at /proc/interrupts.
Which NICs are they? lspci can you help you there. As of matter of fact, run
lspci -vvv and send that.
> 
> Is my kernel module line correct above? are the xen-pciback.permissive
> and resource_alignment options required? Also I am passing through the

Not always. The resource_alignment only if the BARs (look at lspci output) are
not page-aligned. If you have no idea what I am talking about then the answer
is yes.

> onboard NIC's - is this something that should be avoided or is it ok to
> do?

It is fine. That is the first thing I test..

> 
> > > 
> > > I know that this issue is with Xen, as I had this running on a different
> > > server (same xen setup) and it had the same issues, which I initially
> > > thought were to do with the raid card.
> > 
> > So you never ran this setup on this kernel (2.6.32-5) without the Xen 
> > hypervisor?
> 
> no, its always had the hypervisor - but it was running ok before the
> pciback options were added. This week, it's seemed to happen
> approximately every 24 hours.

When this hang occurs, can you do 'xm debug-key Q', 'xm debug-key i', 'xm 
debug-key z'.
Then run 'xm dmesg' and provide that to me?

Is your boot disk on the same disk as the RAID?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.