[ILUG] Why RAID
Tony Bolger
tony at palamon.ie
Fri Jul 16 22:10:17 IST 2004
Paul Jakma wrote:
> On Thu, 15 Jul 2004, Colm Buckley wrote:
>
>> When you add the new (C) drive, all of its data are zeroes, which
>> doesn't change the parity data (N XOR 0 = N). Fiendishly clever.
>
> Hmm, surely you'd have to zero the new disk first though? I dont think
> you can rely on a new disk being all zero. ;)
Yep, you have to zero them, but the nice thing is, you can zero a disk
in advance, and have it add to any of several raid groups quickly at a
later point.
> Also, it doesnt matter, you have to know which the new block is to be
> able to read the stripe correctly surely? (otherwise data of xy turns
> into xy(0)). If you know that, you can also ignore that new block when
> calculting the parity, and hence you dont need to need to care what
> value the new block is.
Netapps address their disks a bit differently to most RAID approaches.
In a 'normal' RAID 4 or 5 setup, the RAID layer presents the same numbered
blocks from each disk in turn (skipping the parity one) to the fs level,
more or less as a virtual disk, with a predictable mapping between physical
disk number / block and virtual disk block numbers.
As you read the virtual blocks, on a 3 + 1 RAID 5 setup, you get:
A0,B0,C0,A1,B1,D1,A2,C2,D2,B3,C3,D3,....
(Letters are disks, Blocks are numbers).
If you add a new disk, you mess up the mappings, because then you're going
A0,B0,C0,D0,A1,B1,C1,E1 ....
Thus your FS sees the new disk and old parity disk interleaved with the data,
and it's not likely to be happy about it.
The netapp approach is to make the FS aware of the physical disks, and let it
worry about spreading the content around. So if you have a file on B0,C0,D0,
(calling A the parity disk), after you add disk E, you still have the same file
on B0,C0,D0, with new blank blocks on E0,E1,E2....
After a while, when lots of stuff is read / written, the FS will have made sure
that there is a roughly equal load on each of the disks.
BTW, the RAID bit makes sure if disk B - E then dies, the FS still thinks it's
there, until a hot-spare is rebuild into the array.
I'm sure a similar effect could be achieved using LVM and some _cunning_
algorithms, but you'd want to check out just how well patented WAFL is first.
It's also possible that other people have done something similar with hardware
RAID block mapping, but i'm not aware of anyone who has.
Tony.
More information about the ILUG
mailing list