Linux: what’s a fast way to find all duplicate files in a directory?

Question

I have a directory with many subdirs and about 7000+ files in total. What I need to find all duplicates of all files. For any given file, its duplicates might be scattered around various subdirs and may or may not have the same file name. A duplicate is a file that you get a 0 return code from the diff

Accepted Answer

On Debian 11:% mkdir files; (cd files; echo "one" > 1; echo "two" > 2a; cp 2a 2b)% find files/ -type f -print0 | xargs -0 md5sum | tee listing.txt |     awk '{print $1}' | sort | uniq -c | awk '$1>1 {print $2}' > dups.txt% grep -f dups.txt listing.txtc193497a1a06b2c72230e6146ff47080  files/2ac193497a1a06b2c72230e6146ff47080  files/2bFind and print all files null terminated (-print0).Use xargs to md5sum them.Save a copy of the sums and filenames in &#8220;listing.txt&#8221; file.Grab the sum and pass to sort then uniq -c to count, saving into the &#8220;dups.txt&#8221; file.Use awk to list duplicates, then grep to find the sum and filename.

Advertisement

Answer