[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen and iSCSI

Am Freitag, 3. Februar 2006 11:05 schrieb Marcin Owsiany:
> On Tue, Jan 31, 2006 at 03:08:35PM +0100, Molle Bestefich wrote:
> > After having used both, I trust MD over hardware controllers any day.
> The only thing that worries me about MD in 2.6.12 is that if you
> simulate a total disk failure (unplug the power cord in a running disk),
> MD just freezes the system for an hour (actually not the whole system,
> but any process which tries to access the filesystem on the array hangs
> until MD has done 100 retries, one every 30 seconds) as the disk appears
> "Busy", instead of just using the other disk.
> This _seems_ fixed in 2.6.14 (the code has changed, so I supposed it's
> better now, but I haven't actually tested this yet - maybe someone here
> has?).

as far as I know 2.6.12 doesn't support hotplugging at all. It seems that this 
is the reason. If a device is not accessible anymore then the kernel will 
have a serious problem with that.

I don't know if 2.6.14 is really the kernel this problem is gone for the first 
time, but at least for 2.6.15 I know that for sure.

more or less all servers (I saw in the past) with disk failures using mdadm on 
an < 2.6.15 kernel were not working normaly after a hdd died. All of them 
needed a reboot.

all servers with >=2.6.15 doesn't seem to have this problem anymore. Here an 
log from a server that "lost" a disk about 1 month ago (on 2.6.15):

hdi: dma_intr: status=0x53 { DriveReady SeekComplete Index Error }
hdi: dma_intr: error=0x86 { DriveStatusError BadCRC TrackZeroNotFound }
ide: failed opcode was: unknown
PDC202XX: Primary channel reset.
ide4: reset: master: sector buffer error
end_request: I/O error, dev hdi, sector 95980216
raid1: Disk failure on hdi, disabling device.
        Operation continuing on 1 devices
raid1: hdi: rescheduling sector 95980216
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:hde
 disk 1, wo:1, o:0, dev:hdi
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:hde
raid1: hde: redirecting sector 95980216 to another mirror

after that the system was running without any problems.


> Otherwise typical disk failures get managed by MD just fine.
> Marcin

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.