[ILUG] tip of the day (find duplicate files)

Brady, Padraig Padraig.Brady at compaq.com
Tue Sep 5 15:38:09 IST 2000


Just did the following script which finds duplicate files 
in the specified directories and their subdirectories.
It's very fast.

usage:
	finddupe [dir1] [dir2] ...
e.gs:
	cd dir1;finddupe
	finddupe /usr/bin /bin /sbin /usr/sbin

Note it requires V2.0 of uniq which is part of
GNU textutils.

Padraig.

#!/bin/sh
# September 2000 * Padraig at Brady001.iol.ie
#
find ${*-.} -xdev -size +0c -type f -printf "%p\0%i\0%s\n" |
tr ' \t\0' '\0\1 '                                         |
sort +2nr +1 -u                                            |
uniq -2 -D                                                 |
cut -f1 -d' '                                              |
sort                                                       |
tr '\0\1\n' ' \t\0'                                        |
xargs -0 md5sum                                            |
sort +0 -1                                                 |
tr ' \t' '\1\2'                                            |
sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/'                     |
uniq -D -1                                                 |
sed -e 's/\(^.*\) \(.*\)/\2 \1/'                           |
tr '\1\2' ' \t'                                            |
(
psum='no match'
line=''
while read sum file; do
  if [ "$sum" != "$psum" ]; then
    if [ ! -z "$line" ]; then
       echo -e "$line"
    fi
    line="`du -b "$file"`"
    psum="$sum"
  else
    line="$line $file"
  fi
done

if [ ! -z "$line" ]; then
  echo -e "$line"
fi
)                                                          |
sort +0 -1 -brn




More information about the ILUG mailing list