[ILUG] Corrupt files
Cian Davis
davisc at skynet.ie
Tue Nov 7 11:18:17 GMT 2006
Just a bit more info on this.
Sample NFS line in /etc/fstab
node01:/home/cian /home/cian nfs
nolock,hard,intr,rsize=8192,wsize=8192,timeo=20 1 2
Right. Now here's something *really* weird. I noticed this problem a day
or 2 ago with 1 particular run of files - none of them would read.
However, I just remembered - I backup to an external disk here. My /home
on the cluster is samba mounted to my windows box. I use a cygwin
compile of rsync to backup the files. I backed up last Tuesday evening
(31st October). I haven't touched any of my files since on either the
cluster or the external disk. I've just tried there and the backups on
the external disk work perfectly - even if I copy them back to the
cluster and read them in from there. Just as a BTW, the rsync would have
yanked the files over NFS and when I copied the backup set to the
cluster a few minutes ago, it would have been sent over NFS to my home
dir. They read fine.
Investigations: Plonk corrupt set and working set into directory.
-backup is the working set and -corrupt is, well, the corrupt set. I've
taken the smallest corrupt set.
cian at master:~/corrupt$ stat -t *
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.cas.gz
10580022 20704 81b4 1000 100 e 2448004 1 0 0 1153945019 1153945019
1162895571 4096
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.dat.gz
28584075 55896 81b4 1000 100 e 2448003 1 0 0 1153945025 1153945025
1162895576 4096
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz
10580022 20704 81b4 1000 100 e 2448006 1 0 0 1153948619 1153948619
1162895621 4096
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.dat.gz
28584075 55896 81b4 1000 100 e 2448005 1 0 0 1153948625 1153948625
1162895613 4096
cian at master:~/corrupt$ md5sum *
624afff87d49b32ed699aaa476d87fbf
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.cas.gz
ae06e44a97dd849bdd2fb85ab7625f36
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.dat.gz
b8e4c40b29af3a9c9bca5d58e0da4659
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz
1fe725b04fc01cfea19141c43e0dbe3f
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.dat.gz
cian at master:~/corrupt$ gunzip
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz
gunzip:
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz:
invalid compressed data--crc error
gunzip:
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz:
invalid compressed data--length error
I'm going to keep jabbing at them and see what caused the corrupt set to
change. According to stat, the modification dates are the same.
It's been suggested that I install munin to keep an eye on stuff so I'll
do that.
Thanks for everyone's help.
Regards,
Cian
Cian Davis wrote:
>
> Hi,
> I have a weird, frustrating problem and would appreciate the insights
> of anyone on this list. Please bear with me, it's a long mail but the
> problem needs to be described.
>
> Most of us use software called Fluent and one person in the group uses
> CFX. All our desktop machines are Windows and we use the Windows
> version but we have a cluster of 9 Fujitsu-Siemens dual processor Xeons.
>
> When the cluster was initially delivered, it was running RedHat 6.
> After a few months, some of the Fluent users found that their files
> wouldn't read because they were corrupted.
>
> Regards,
> Cian
>
>
>
>
More information about the ILUG
mailing list