Finds all duplicate files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finds all duplicate files
# 1  
Old 03-03-2016
Finds all duplicate files

Hi,
How would you write bash script that given a directory as an argument and finds all duplicate files (with same contents - by using bytewise comparison) there and prints their names?
# 2  
Old 03-03-2016
Is this homework?
# 3  
Old 03-03-2016
homework?? hhhhh
I'm 28 years old.. I've finished school long long time ago Smilie
# 4  
Old 03-03-2016
crude, but try:
Code:
[[ -d "$1" ]] || { echo "$1 not a directory" ;  exit; }
for f in "${1%/}"/*
do
   for g in "${1%/}"/*
   do
     [[ "$f" = "$g" ]] && continue
     cmp "$f" "$g" 2>&1 | grep -q "." || echo "$f"="$g"
   done
done

# 5  
Old 03-04-2016
How about
Code:
md5sum * 2>/dev/null | sort | uniq -Dw32 | cut -d" " -f3

? It traverses the directory (and thus opens every single file) only once. It does NOT do a bytewise comparison.
# 6  
Old 03-04-2016
Hi.

If there already exists a solution, then I usually make use of it:
Code:
Find duplicate, similar files

        1) fdupes, duff, rdfind

        2) simhash, find similarity of files

        3) textanalyze (local), find words that are characteristic for a file

Best wishes ... cheers, drl
# 7  
Old 03-05-2016
Also using checksum hashes (not a bitwise comparison), try with your favorite checksum utility (in this example shasum) :

Code:
find /some/dir -type f -exec shasum {} + |
awk '{i=$1; $1=x; C[i]++; A[i]=A[i] $0 FS} END{for(i in C) if(C[i]>1) print A[i]}'

This assumes that duplicates may also be found in sub directories...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Command finds some, misses some

The contents of my home directory: bin Desktop Documents Downloads folders Music Pictures Public Templates Videos When I run the command for file in /home/myself/*d*; do if ; then echo $file; fi; doneit finds /home/myself/Downloads /home/myself/Videos but not "folders". ... (5 Replies)
Discussion started by: Xubuntu56
5 Replies

2. Shell Programming and Scripting

Duplicate files

Hi Gents, I have 1 files as seen below. 44571009 100 42381900 101 23482389 102 44571009 103 28849007 104 28765648 105 25689908 106 28765648 107 42381900 108 44571009 109 17298799 110 44571009 111 I would like to get something like it 44571009 100 103 109 111 (3 Replies)
Discussion started by: jiam912
3 Replies

3. Shell Programming and Scripting

Remove duplicate files

Hi, In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named: 12345___PP___0902___AA.txt 12346___PP___0902___AA. txt 12347___PP___0902___AA. txt The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies

4. UNIX for Dummies Questions & Answers

find -size -7M finds files, but won't cp them all

If I run: find /somefolder -type f -size -7M | wc -l I get 73594 files But when I run find /somefolder -type f -size -7M -exec /bin/cp -v {} /someotherfolder/ \; it only copies 38891 of the files to the folder, why? There's a mix of all types of files in /somefolder. Is there some other... (12 Replies)
Discussion started by: unclecameron
12 Replies

5. UNIX for Dummies Questions & Answers

For Loop To Rename Multiple Files Finds One Non-existant File

Okay so here's something that's confusing me: I have a script that's designed to remove the words "new_" from the front of any file except two exceptions and it looks something like this... for i in new_* do if ] && ]; then j=`echo "$i"|cut -c5-` mv $i $j fi done ... (5 Replies)
Discussion started by: Korn0474
5 Replies

6. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

7. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

8. Shell Programming and Scripting

Finding Duplicate files

How do you delete and and find duplicate files? (1 Reply)
Discussion started by: Jicom4
1 Replies

9. Shell Programming and Scripting

getting rid of duplicate files

i have a bad problem with multiple occurances of the same file in different directories.. how this happened i am not sure! but I know that i can use awk to scan multiple directory trees to find an occurance of the same file... some of these files differ somwhat but that does not matter! the... (4 Replies)
Discussion started by: moxxx68
4 Replies

10. UNIX for Dummies Questions & Answers

PS finds a ghost?

Hello, I have problems executing a script in ksh with this script named process.sh: ps -ef | grep process.sh | grep -v grep | wc -l | read a if then echo "The script is running" exit 0 fiThe problem is that when I execute the script, sometimes it shows the message "The script is... (10 Replies)
Discussion started by: jguirao
10 Replies
Login or Register to Ask a Question