[KLUG Members] RE: RAID 5 with hot spare

Bob Kanaley members@kalamazoolinux.org
Mon, 18 Nov 2002 16:02:30 -0500


I lost two drives in a SCSI RAID 5 array on a samba file server about two
months ago. The end users never knew anything had happened. About a year
ago, I had the foresight to put a hot spare into the array. The hot spare
probably kept me from loosing data or having the server crash, but I should
mention that it was a bearcat to stitch the array back together.

The physical configuration of the array turned into a mess because the hot
spare had rotated into the array when the first drive failed but when the
second drive failed there was a dead hot spare. The array started kicking
out errors to consol and log files. I announced a planned file server outage
over an extended lunchtime and put in a call to DPT.

Figuring out which physical drives on the chain had failed was the tricky
part. After determining which physical drives had failed, it was possible to
determine that only one drive had actually died. Thus, it was possible to
physically remove the dead drive from the chain, reconfigure the second
failed drive to the first drive's SCSI ID, put it back onto the chain in the
first drives' location and rebuild the array. (To minimize my Rolaids bill,
I am trying to come up with a scheme to snapshot the drive locations on the
chain, the SCSI ID, and the logical ID in the RAID 5 array.)

If my old DPT controller is typical, the manual says it is possible to add
multiple hot spares to a RAID 5 array.

If the goal is redundant servers with auto fail-over, there are a couple of
alternatives. The most visible open-source approach is probably the High
Availability Linux project (www.linux-ha.org). The December issue of Linux
Journal has an article about highly availability LDAP using the heartbeat
program.

For LAN mirroring, there is an additional program whose name escapes me
right now. Essentially it monitors a file system for changes that can then
be rsynced to the backup server to keep the backup server in sync. If the
heartbeat between mirrored servers dies, within seconds the backup server
takes over. Sometime next year I hope to have an auto fail-over file server
system like this setup.


Bob

Robert V. Kanaley
Manager Information Systems
Agdia, Inc.
rvk@agdia.com
http://www.agdia.com

P.S.

>Date: Mon, 18 Nov 2002 00:19:31 -0500
>From: Mike Williams <knightperson@zuzax.com>
>To: members@kalamazoolinux.org
>Subject: [KLUG Members] Re:  RAID
>
>How about RAID 5 with a hot spare?  That will protect you from
>sequential failures, and it will be less costly than (And, I think, less
>messy) than trying to software mirror two arrays.