[ILUG] Deleting duplicate photos
Keith Gaughan
kmgaughan at eircom.net
Tue Sep 30 06:00:22 IST 2008
Timothy Murphy wrote:
> What is the best way of eliminating duplicate photos
> on a number of machines, all running Linux (Fedora or CentOS)?
>
> I suppose one could ask the same question about files generally;
> how to tag or delete duplicates.
>
> Any suggestions gratefully received.
Here's what I use myself:
http://talideon.com/weblog/2008/02/find-duplicates.cfm
It was written partly because I needed (and I really do mean *needed*)
something like this, and partly because I wanted a decent demonstration
of how to use generators in Python for the next time I was asked.
It does the sorting in three phases, each one slower than the last: first by
size (which catches an awful lot generally), then passing the contents of
each file through either zlib.crc32 (Adler-32) or hashlib.md5 (zlib.crc32 is
much faster than hashlib.md5 and though the results aren't quite as good,
gives a significant net speed-up generally), and then compares the remaining
groups of files directly with one another.
I've been meaning to extend it so that it treats certain kinds of file
differently, such as ignoring ID3 and EXIF data, but I've never had the time
or need.
K.
More information about the ILUG
mailing list