[ILUG] Corrupt files
davisc at skynet.ie
Tue Nov 7 11:18:17 GMT 2006
Just a bit more info on this.
Sample NFS line in /etc/fstab
node01:/home/cian /home/cian nfs
nolock,hard,intr,rsize=8192,wsize=8192,timeo=20 1 2
Right. Now here's something *really* weird. I noticed this problem a day
or 2 ago with 1 particular run of files - none of them would read.
However, I just remembered - I backup to an external disk here. My /home
on the cluster is samba mounted to my windows box. I use a cygwin
compile of rsync to backup the files. I backed up last Tuesday evening
(31st October). I haven't touched any of my files since on either the
cluster or the external disk. I've just tried there and the backups on
the external disk work perfectly - even if I copy them back to the
cluster and read them in from there. Just as a BTW, the rsync would have
yanked the files over NFS and when I copied the backup set to the
cluster a few minutes ago, it would have been sent over NFS to my home
dir. They read fine.
Investigations: Plonk corrupt set and working set into directory.
-backup is the working set and -corrupt is, well, the corrupt set. I've
taken the smallest corrupt set.
cian at master:~/corrupt$ stat -t *
10580022 20704 81b4 1000 100 e 2448004 1 0 0 1153945019 1153945019
28584075 55896 81b4 1000 100 e 2448003 1 0 0 1153945025 1153945025
10580022 20704 81b4 1000 100 e 2448006 1 0 0 1153948619 1153948619
28584075 55896 81b4 1000 100 e 2448005 1 0 0 1153948625 1153948625
cian at master:~/corrupt$ md5sum *
cian at master:~/corrupt$ gunzip
invalid compressed data--crc error
invalid compressed data--length error
I'm going to keep jabbing at them and see what caused the corrupt set to
change. According to stat, the modification dates are the same.
It's been suggested that I install munin to keep an eye on stuff so I'll
Thanks for everyone's help.
Cian Davis wrote:
> I have a weird, frustrating problem and would appreciate the insights
> of anyone on this list. Please bear with me, it's a long mail but the
> problem needs to be described.
> Most of us use software called Fluent and one person in the group uses
> CFX. All our desktop machines are Windows and we use the Windows
> version but we have a cluster of 9 Fujitsu-Siemens dual processor Xeons.
> When the cluster was initially delivered, it was running RedHat 6.
> After a few months, some of the Fluent users found that their files
> wouldn't read because they were corrupted.
More information about the ILUG