[ILUG] Random disk errors on HP Netserver/NetRaid
lhecking at nmrc.ie
Thu Jul 22 13:38:08 IST 2004
RedHat 9 with latest updates
HP Netserver LP1000R
HP NetRaid 1M with 3 Seagate 73GB disks
I installed the machine about two weeks ago, and since yesterday we are
seeing random disk and scsi errors. The 3 disks are concatenated into one
210GB logical disk, and the partitioning is as follows
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda3 ext3 10317860 5385960 4407780 55% /
/dev/sda1 ext3 124427 10024 107979 9% /boot
/dev/sda8 ext3 175428196 4347920 162169024 3% /home
/dev/sda2 ext3 10317860 2513312 7280428 26% /opt
none tmpfs 1032200 0 1032200 0% /dev/shm
/dev/sda6 ext3 5162796 712876 4187664 15% /tmp
/dev/sda5 ext3 5162796 1626308 3274232 34% /usr/local
Symptons: the machine either crashes silently, or reports tons of scsi
errors when trying to read (e.g. tar) or write files on /home, /usr/local.
When I go into the NetRaid's menu after a reboot, it sometimes reports
that a random combination of 1, 2 or 3 disks are offline. Bringing the
disks back online works fine, and there seems to be no data loss, but
it is virtually impossible to use the machine.
I do believe this is a NetRaid related issue, not an OS issue - we have
exactly the same software setup on a Dell PE1750, no problems.
It could be a disk problem, but all disks and controller are new, and I
had no problems whatsoever partitioning and installing the OS.
Any ideas how to diagnose this properly? The problem is that the NetRaid
is not under maintenance, so we can't call HP.
More information about the ILUG