[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] 3w-xxxx / 3ware 8006-2LP corruption issues using Xen kernel



Holm Kapschitzki wrote:
Bas Verhoeven schrieb:
Hi Holm,

look at my ealier post, i describe the same problem.

http://www.nabble.com/dom0---tar:-Skipping-to-next-header-td16558409.html

so i get the error with etch 32 bit / 64 bit, xen 3.1 / 3.2 , with 2.6.18 kernel xen , with gentoo kernel 2.6.20r6 xen. It wasnt all the time. But i have to reboot the maschine to get it solved for a while. So i testet it with ca. 5 machines, setup in different ways with other kernels.
In a way I'm happy I'm not the only one experiencing this problem. Are you using the exact same controller as I am?

I did experience some issues when I would remove most of the memory; so the system would be left with 1GB of memory. At that point, running my script would cause several errors, ending up in the partition becoming read-only:

   PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:02:01.0
   3w-xxxx: tw_map_scsi_sg_data(): pci_map_sg() failed.
   PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:02:01.0
   3w-xxxx: tw_map_scsi_sg_data(): pci_map_sg() failed.
   ...
   sd 0:0:0:0: SCSI error: return code = 0x00070000
   end_request: I/O error, dev sda, sector 3068774
   Buffer I/O error on device dm-0, logical block 321289
   lost page write due to I/O error on dm-0
   Buffer I/O error on device dm-0, logical block 321290
   lost page write due to I/O error on dm-0
   ...
   end_request: I/O error, dev sda, sector 11794406
   Aborting journal on device dm-0.
   ext3_abort called.
   EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted
   journal
   Remounting filesystem read-only
   __journal_remove_journal_head: freeing b_frozen_data


This problems seems to be unrelated tho, and some googling pointed me to some 'swiotlb' kernel parameter (https://bugzilla.novell.com/show_bug.cgi?id=299641), which I set to 32M and seems to run OK for now.
Data is still being written corrupted to disk tho.

I think i could be a kernel compile parameter? or via chipset in relation to 3ware raid controller?
Well, I hardly doubt it's something in the hardware itself. That just does not explain why everything works fine under a non-Xen kernel. All kernels I tried have the 3ware driver loaded as a module. The drivers under both kernels appear to be the same:

   p-dom0:/usr/src/xen-3.2.0/linux-2.6.18-xen.hg# sha1sum
   drivers/scsi/3w-x*
   d9da8960f6e98b783b4893cde51a303d97ce98d8  drivers/scsi/3w-xxxx.c
   2610261f86b4eb05a5d08c1f90f09410f1eb7c98  drivers/scsi/3w-xxxx.h

   p-dom0:/usr/src/linux-2.6.18.8# sha1sum drivers/scsi/3w-x*
   d9da8960f6e98b783b4893cde51a303d97ce98d8  drivers/scsi/3w-xxxx.c
   2610261f86b4eb05a5d08c1f90f09410f1eb7c98  drivers/scsi/3w-xxxx.h

So whatever is breaking stuff, must be something in the Xen code? I'm going to compile a kernel with the 3w-xxxx driver compiled in, but I doubt that helps.

Is there even anyone that uses the same controller and has no problems at all?

Cheers,

Bas Verhoeven


Greets Holm


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.