[ILUG] Random disk errors on HP Netserver/NetRaid

Lars Hecking lhecking at nmrc.ie
Thu Jul 22 13:38:08 IST 2004


 RedHat 9 with latest updates
 HP Netserver LP1000R
 HP NetRaid 1M with 3 Seagate 73GB disks

 I installed the machine about two weeks ago, and since yesterday we are
 seeing random disk and scsi errors. The 3 disks are concatenated into one
 210GB logical disk, and the partitioning is as follows

Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sda3     ext3    10317860   5385960   4407780  55% /
/dev/sda1     ext3      124427     10024    107979   9% /boot
/dev/sda8     ext3   175428196   4347920 162169024   3% /home
/dev/sda2     ext3    10317860   2513312   7280428  26% /opt
none         tmpfs     1032200         0   1032200   0% /dev/shm
/dev/sda6     ext3     5162796    712876   4187664  15% /tmp
/dev/sda5     ext3     5162796   1626308   3274232  34% /usr/local

 Symptons: the machine either crashes silently, or reports tons of scsi
 errors when trying to read (e.g. tar) or write files on /home, /usr/local.
 When I go into the NetRaid's menu after a reboot, it sometimes reports
 that a random combination of 1, 2 or 3 disks are offline. Bringing the
 disks back online works fine, and there seems to be no data loss, but
 it is virtually impossible to use the machine.

 I do believe this is a NetRaid related issue, not an OS issue - we have
 exactly the same software setup on a Dell PE1750, no problems.
 It could be a disk problem, but all disks and controller are new, and I
 had no problems whatsoever partitioning and installing the OS.

 Any ideas how to diagnose this properly? The problem is that the NetRaid
 is not under maintenance, so we can't call HP.

 Thanks!




More information about the ILUG mailing list