[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Possible bug with scsi disk and Xen



On Sat, Jan 29, 2011 at 07:32:31PM +0100, Jordan Pittier wrote:
>    >Xen dom0 kernel does irq handling through Xen hypervisor,
>    >so that might make some drivers behave in a different way baremetal vs.
>    dom0.
>    Ok, so the driver is a good "responsible" for this SCSI crazyness.


I'm not sure if it is, but it *could* be.


>    >What driver version does the squeeze kernel have?
>    3.04. Which seems to be several years old. There is lot of users
>    complaining about LSI drivers all over the Internet.*
>    I will keep you posted as soon as I manage to build the latest driver.

See here for tips how to build updated megaraid_sas driver: 
http://lists.xensource.com/archives/html/xen-devel/2010-11/msg00250.html

Maybe it helps also with your driver.

-- Pasi

>    On Sat, Jan 29, 2011 at 7:25 PM, Pasi K*rkk*inen <[1]pasik@xxxxxx> wrote:
>    > On Sat, Jan 29, 2011 at 07:03:16PM +0100, Jordan Pittier wrote:
>    >> Thanks for your reply. LSI has indeed newer driver for the controler;
>    >> but I can't "build" it, there's an error when I try to compile it [see
>    >> attachement]. I will give another try in the next days.
>    >>
>    >> What is puzzling is that the IO errors only occurs with Xen HV. I am
>    >> 100% willing to accept that the problem is the drivers, but how come
>    >> the exact same kernel (the xenified one) could work fine without Xen
>    >> loaded ? I am almost a noob in kernel/driver and stuff; but I thought
>    >> the drivers were entirely in the kernel.
>    >>
>    >
>    > Yep, the driver is entirely in the kernel, but that's not the whole
>    story.
>    >
>    > Xen dom0 kernel does irq handling through Xen hypervisor,
>    > so that might make some drivers behave in a different way baremetal vs.
>    dom0.
>    >
>    > Also remember dom0 is a *vm*, so some timing stuff might happen
>    > differently on baremetal vs. dom0.
>    >
>    >> I will try with the latest kernel in a few days.
>    >>
>    >> SLES11SP1 ships mptfusion 4.22
>    >>
>    
> ([2]http://www.novell.com/linux/releasenotes/x86_64/SUSE-SLES/11-SP1/#driver-updates-storage)
>    >> I dont know for RHEL
>    >>
>    >
>    > What driver version does the squeeze kernel have?
>    >
>    >
>    > -- Pasi
>    >
>    >
>    >> On Sat, Jan 29, 2011 at 6:02 PM, Pasi K*rkk*inen <[3]pasik@xxxxxx>
>    wrote:
>    >> > On Sat, Jan 29, 2011 at 04:27:25PM +0100, Jordan Pittier wrote:
>    >> >> * *Hi,
>    >> >> * *I have been tracking a bug affecting all my servers running
>    Debian Squeeze
>    >> >> * *for more than a month now, and I*desperately*need your help :)*
>    >> >> * *I have 10 Sun v20z servers (2*66GB SCSI disk in RAID 1 ==
>    mirror). 4 of
>    >> >> * *them are running Debian Squeeze with the latest Xen Debian kernel
>    >> >> * *(2.6.32-5-xen-amd64 ==*2.6.32-29). The rest are running Debian
>    Lenny
>    >> >> * *(2.6.26-2-xen-amd64 ==*2.6.26-26lenny1).
>    >> >> * *On a Squeeze boxe, under very high IO (such as running a IO
>    stress test,
>    >> >> * *ie bonnie++), server starts behaving*weirdly and I see messages
>    like these
>    >> >> * *in kernel.log : [see attachement]. Then the server becomes
>    totally
>    >> >> * *unresponsive (but doesn't "freeze") and commands such as "ls" or
>    "reboot"
>    >> >> * *don't work anymore. I have to do an hard reboot. After the server
>    has
>    >> >> * *reboot, the RAID array seems degraded (I am using the mpt-status
>    command)
>    >> >> * *and starts rebuilding. After several hours, the raid array is
>    "fine"
>    >> >> * *("clean"). The raid controler is "LSI53C1030" U320, with driver
>    "Fusion
>    >> >> * *MPT SPI Host driver 3.04.06". I have attached the result of
>    "lsmod".
>    >> >> * *None of my Lenny boxes are affected by this issue, all of my
>    Squeeze boxes
>    >> >> * *are.
>    >> >> * *What does it have to do with Xen ? When I boot my Squeeze boxes
>    without
>    >> >> * *the Xen hypervisor but the same Xen kernel, bonnie++
>    runs*absolutely*fine.
>    >> >> * *The issue appears only with the Xen hypervisor loaded.*
>    >> >> * *There is a debian bug report for this
>    >> >> * *:*[1][4]http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727
>    >> >> * *Any suggestion ?*
>    >> >
>    >> > Did you check if LSI has newer driver version available?
>    >> >
>    >> > Also you might check which driver version for example RHEL6
>    >> > or SLES11SP1 ships with.. both of those distros have 2.6.32 kernels
>    too.
>    >> >
>    >> > On one of my testboxes I need to upgrade the LSI driver
>    >> > to a newer version to make it work. This is SAS based LSI though.
>    >> >
>    >> > Can you try using another disk controller?
>    >> >
>    >> > Also: Did you try using the latest kernel (-30) ?
>    >> >
>    >> > -- Pasi
>    >> >
>    >> >
>    >
> 
> References
> 
>    Visible links
>    1. mailto:pasik@xxxxxx
>    2. 
> http://www.novell.com/linux/releasenotes/x86_64/SUSE-SLES/11-SP1/#driver-updates-storage
>    3. mailto:pasik@xxxxxx
>    4. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.