[ILUG] tip of the day (find duplicate files)
Brady, Padraig
Padraig.Brady at compaq.com
Tue Sep 5 15:38:09 IST 2000
Just did the following script which finds duplicate files
in the specified directories and their subdirectories.
It's very fast.
usage:
finddupe [dir1] [dir2] ...
e.gs:
cd dir1;finddupe
finddupe /usr/bin /bin /sbin /usr/sbin
Note it requires V2.0 of uniq which is part of
GNU textutils.
Padraig.
#!/bin/sh
# September 2000 * Padraig at Brady001.iol.ie
#
find ${*-.} -xdev -size +0c -type f -printf "%p\0%i\0%s\n" |
tr ' \t\0' '\0\1 ' |
sort +2nr +1 -u |
uniq -2 -D |
cut -f1 -d' ' |
sort |
tr '\0\1\n' ' \t\0' |
xargs -0 md5sum |
sort +0 -1 |
tr ' \t' '\1\2' |
sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/' |
uniq -D -1 |
sed -e 's/\(^.*\) \(.*\)/\2 \1/' |
tr '\1\2' ' \t' |
(
psum='no match'
line=''
while read sum file; do
if [ "$sum" != "$psum" ]; then
if [ ! -z "$line" ]; then
echo -e "$line"
fi
line="`du -b "$file"`"
psum="$sum"
else
line="$line $file"
fi
done
if [ ! -z "$line" ]; then
echo -e "$line"
fi
) |
sort +0 -1 -brn
More information about the ILUG
mailing list