[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Possible bug with scsi disk and Xen



On Wed, Feb 02, 2011 at 11:43:29PM +0100, Jordan Pittier wrote:
>    Hi,
>    Finally I managed to compile the driver LSI MPT Fusion 4.22. I took the
>    source from kernel 2.6.34 shipped with SLES. Then I slightly changed the
>    driver sources to "backport" it on a debian 2.6.32.
>    Now my servers seem 100% stable, so I am verry happy :) Thanks for your
>    big hint toward a possible depreciated driver.
>

Good to hear it helped!

-- Pasi

>    Jordan
> 
>    On Sat, Jan 29, 2011 at 7:49 PM, Pasi Kärkkäinen <[1]pasik@xxxxxx> wrote:
> 
>      On Sat, Jan 29, 2011 at 07:32:31PM +0100, Jordan Pittier wrote:
>      >    >Xen dom0 kernel does irq handling through Xen hypervisor,
>      >    >so that might make some drivers behave in a different way
>      baremetal vs.
>      >    dom0.
>      >    Ok, so the driver is a good "responsible" for this SCSI crazyness.
> 
>      I'm not sure if it is, but it *could* be.
> 
>      >    >What driver version does the squeeze kernel have?
>      >    3.04. Which seems to be several years old. There is lot of users
>      >    complaining about LSI drivers all over the Internet.*
>      >    I will keep you posted as soon as I manage to build the latest
>      driver.
> 
>      See here for tips how to build updated megaraid_sas driver:
>      
> [2]http://lists.xensource.com/archives/html/xen-devel/2010-11/msg00250.html
> 
>      Maybe it helps also with your driver.
> 
>      -- Pasi
>      >    On Sat, Jan 29, 2011 at 7:25 PM, Pasi K*rkk*inen
>      <[1][3]pasik@xxxxxx> wrote:
>      >    > On Sat, Jan 29, 2011 at 07:03:16PM +0100, Jordan Pittier wrote:
>      >    >> Thanks for your reply. LSI has indeed newer driver for the
>      controler;
>      >    >> but I can't "build" it, there's an error when I try to compile
>      it [see
>      >    >> attachement]. I will give another try in the next days.
>      >    >>
>      >    >> What is puzzling is that the IO errors only occurs with Xen HV.
>      I am
>      >    >> 100% willing to accept that the problem is the drivers, but how
>      come
>      >    >> the exact same kernel (the xenified one) could work fine without
>      Xen
>      >    >> loaded ? I am almost a noob in kernel/driver and stuff; but I
>      thought
>      >    >> the drivers were entirely in the kernel.
>      >    >>
>      >    >
>      >    > Yep, the driver is entirely in the kernel, but that's not the
>      whole
>      >    story.
>      >    >
>      >    > Xen dom0 kernel does irq handling through Xen hypervisor,
>      >    > so that might make some drivers behave in a different way
>      baremetal vs.
>      >    dom0.
>      >    >
>      >    > Also remember dom0 is a *vm*, so some timing stuff might happen
>      >    > differently on baremetal vs. dom0.
>      >    >
>      >    >> I will try with the latest kernel in a few days.
>      >    >>
>      >    >> SLES11SP1 ships mptfusion 4.22
>      >    >>
>      >
>       
> ([2][4]http://www.novell.com/linux/releasenotes/x86_64/SUSE-SLES/11-SP1/#driver-updates-storage)
>      >    >> I dont know for RHEL
>      >    >>
>      >    >
>      >    > What driver version does the squeeze kernel have?
>      >    >
>      >    >
>      >    > -- Pasi
>      >    >
>      >    >
>      >    >> On Sat, Jan 29, 2011 at 6:02 PM, Pasi K*rkk*inen
>      <[3][5]pasik@xxxxxx>
>      >    wrote:
>      >    >> > On Sat, Jan 29, 2011 at 04:27:25PM +0100, Jordan Pittier
>      wrote:
>      >    >> >> * *Hi,
>      >    >> >> * *I have been tracking a bug affecting all my servers
>      running
>      >    Debian Squeeze
>      >    >> >> * *for more than a month now, and I*desperately*need your
>      help :)*
>      >    >> >> * *I have 10 Sun v20z servers (2*66GB SCSI disk in RAID 1 ==
>      >    mirror). 4 of
>      >    >> >> * *them are running Debian Squeeze with the latest Xen Debian
>      kernel
>      >    >> >> * *(2.6.32-5-xen-amd64 ==*2.6.32-29). The rest are running
>      Debian
>      >    Lenny
>      >    >> >> * *(2.6.26-2-xen-amd64 ==*2.6.26-26lenny1).
>      >    >> >> * *On a Squeeze boxe, under very high IO (such as running a
>      IO
>      >    stress test,
>      >    >> >> * *ie bonnie++), server starts behaving*weirdly and I see
>      messages
>      >    like these
>      >    >> >> * *in kernel.log : [see attachement]. Then the server becomes
>      >    totally
>      >    >> >> * *unresponsive (but doesn't "freeze") and commands such as
>      "ls" or
>      >    "reboot"
>      >    >> >> * *don't work anymore. I have to do an hard reboot. After the
>      server
>      >    has
>      >    >> >> * *reboot, the RAID array seems degraded (I am using the
>      mpt-status
>      >    command)
>      >    >> >> * *and starts rebuilding. After several hours, the raid array
>      is
>      >    "fine"
>      >    >> >> * *("clean"). The raid controler is "LSI53C1030" U320, with
>      driver
>      >    "Fusion
>      >    >> >> * *MPT SPI Host driver 3.04.06". I have attached the result
>      of
>      >    "lsmod".
>      >    >> >> * *None of my Lenny boxes are affected by this issue, all of
>      my
>      >    Squeeze boxes
>      >    >> >> * *are.
>      >    >> >> * *What does it have to do with Xen ? When I boot my Squeeze
>      boxes
>      >    without
>      >    >> >> * *the Xen hypervisor but the same Xen kernel, bonnie++
>      >    runs*absolutely*fine.
>      >    >> >> * *The issue appears only with the Xen hypervisor loaded.*
>      >    >> >> * *There is a debian bug report for this
>      >    >> >> *
>      *:*[1][4][6]http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727
>      >    >> >> * *Any suggestion ?*
>      >    >> >
>      >    >> > Did you check if LSI has newer driver version available?
>      >    >> >
>      >    >> > Also you might check which driver version for example RHEL6
>      >    >> > or SLES11SP1 ships with.. both of those distros have 2.6.32
>      kernels
>      >    too.
>      >    >> >
>      >    >> > On one of my testboxes I need to upgrade the LSI driver
>      >    >> > to a newer version to make it work. This is SAS based LSI
>      though.
>      >    >> >
>      >    >> > Can you try using another disk controller?
>      >    >> >
>      >    >> > Also: Did you try using the latest kernel (-30) ?
>      >    >> >
>      >    >> > -- Pasi
>      >    >> >
>      >    >> >
>      >    >
>      >
>      > References
>      >
>      >    Visible links
>      >    1. mailto:[7]pasik@xxxxxx
>      >    2.
>      
> [8]http://www.novell.com/linux/releasenotes/x86_64/SUSE-SLES/11-SP1/#driver-updates-storage
>      >    3. mailto:[9]pasik@xxxxxx
>      >    4. [10]http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727
> 
> References
> 
>    Visible links
>    1. mailto:pasik@xxxxxx
>    2. http://lists.xensource.com/archives/html/xen-devel/2010-11/msg00250.html
>    3. mailto:pasik@xxxxxx
>    4. 
> http://www.novell.com/linux/releasenotes/x86_64/SUSE-SLES/11-SP1/#driver-updates-storage
>    5. mailto:pasik@xxxxxx
>    6. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727
>    7. mailto:pasik@xxxxxx
>    8. 
> http://www.novell.com/linux/releasenotes/x86_64/SUSE-SLES/11-SP1/#driver-updates-storage
>    9. mailto:pasik@xxxxxx
>   10. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.