[ILUG] Why RAID
Rick Moen
rick at linuxmafia.com
Tue Jul 13 16:45:39 IST 2004
Quoting Timothy Murphy (tim at birdsnest.maths.tcd.ie):
> Personally I just rsync to an ancient (PII) machine with a large disk.
That's backup, not redundancy. More on that below.
> The chances of a total disk failure are negligible in my experience
> (especially with SCSI disks).
> I'd actually be more worried about 2 disks on the same machine
> being struck by lightning, pee-ed on by the cat, etc.
See, people are getting confused between three different but similar
concepts:
o redundancy
o backup
o archival storage
These protect against different threat models. Hasn't ILUG had this
discussion before?
Here's a relevant anecdote, cross-posted from a similar discussion
elsewhere:
From rick Tue Jul 6 12:43:27 2004
Date: Tue, 6 Jul 2004 12:43:27 -0700
To: luv-main at luv.asn.au
Subject: Re: A small workgroup server
X-Mas: Bah humbug.
User-Agent: Mutt/1.5.5.1+cvs20040105i
[Skipping most of this discussion. There are too many points that would
need to be covered, to do it right.]
Quoting Russell Coker (russell at coker.com.au):
> Anyone who doesn't want a single disk failure to lose all their data needs
> RAID or a good backup.
Er...
Here's a story for you: I was sent out to a network-consulting client,
an architecture firm in San Francisco. Customer had delayed acting on
my urgent advice about installing proper ventilation into an area that
was being converted into a network closet, and relied on a sign on the
door saying to never close it.
On a Friday, someone closed that door, shutting off all ventilation for
the impromptu server room. Monday afternoon, customer realised he had a
degraded RAID1 pair, and called in the firm I was working for, to deal
with it.
First thing upon assessing server condition, I checked on the condition
of Friday backup tapes (seemed OK), and then fetched my own personal
spare hard drive to remirror the remaining drive onto it, for safety's
sake. As I feared would happen, the customer drive failed completely
during the remirror operation. I was obliged to do a fresh OS install
and restore the Friday backup, as the next best thing. Customer CEO was
extremely upset about losing an entire day's work, and complained to my
firm. I replied that he was damned lucky to lose only that much, and
should thank me for ensuring that he had a well-tested backup regimen as
a safety-net protecting the firm against fallout from bad management
IT decisions.
Moral: The threat model that takes out one drive of your RAID set might
very well take out the other drives, at the same time: heat buildup,
power spikes, catastrophically failing disk controllers, PDUs committing
seppuku, fire/smoke damage, etc. Therefore, _never ever_ rely only on
RAID to protect data sets.
> Which brings me to one of the biggest problems with SCSI, almost no-one
> terminates it properly!
Well, I've seldom heard of such an easily fixable problem. Yes, dimwit
VAR hardware installers are very common, doubly so among white-box
vendors. But it's really pretty easy to study how to do termination
right, and then check their work.
For the same reason, when I have to work with ethernet cabling
contractors, I quality-check every single run, and make them do again
the ones that they stuff up, as many times as required until they get it
right.
The ex-telco guys are the worst, because they're convinced they learned
everything worth knowing, thirty years ago.
--
Cheers, "Transported to a surreal landscape, a young girl kills the first
Rick Moen woman she meets, and then teams up with three complete strangers
rick at linuxmafia.com to kill again." -- Rick Polito's That TV Guy column,
describing the movie _The Wizard of Oz_
More information about the ILUG
mailing list