Find Duplicate files, not by name | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Find Duplicate files, not by name

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-20-2009
Ikon's Avatar
Ikon Ikon is offline Forum Advisor  
Computer Geek
 
Join Date: Jul 2008
Last Activity: 10 July 2013, 11:06 AM EDT
Location: Frederick, MD
Posts: 748
Thanks: 4
Thanked 10 Times in 9 Posts
Find Duplicate files, not by name

I have a directory with images:

Code:
-rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg
-rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg
-rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg
-rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg
-rw-r--r-- 1 root root 31350 Mar 20 01:00 020109.010002.jpg
-rw-r--r-- 1 root root 31350 Mar 20 01:15 020109.011501.jpg
-rw-r--r-- 1 root root  8060 Mar 20 01:30 020109.013002.jpg
-rw-r--r-- 1 root root  8062 Mar 20 01:45 020109.014501.jpg

Some images are identical, but file names are different.

How can I write a script to find and delete the duplicates.

Is there a better way than this?


Code:
#!/bin/bash
DIR="/path/to/images"
echo "Starting:"
for file1 in ${DIR}/*.jpg; do
        for file2 in ${DIR}/*.jpg; do
                if [ $file1 != $file2 ]; then
                        DIFF=`diff "$file1" "$file2" -q`
                        if [ "${DIFF%% *}" != "Files" ]; then
                                echo "Same: $file1 $file2"
                                echo "Remove: $file2"
                                rm "$file1"
                                break
                        fi
                fi
        done
done
echo "Done."

Sponsored Links
    #2  
Old 03-20-2009
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 2 September 2014, 4:09 PM EDT
Location: NM
Posts: 10,188
Thanks: 275
Thanked 785 Times in 734 Posts
use md5 or another checksum or hash, I just used cksum:

Code:
cksum  *.jpg | sort -n > filelist

change the sort command if you use md5.

The files with identical checksums are identical. Read the file over before you go on to part 2 below:

Code:
old=""
while read sum lines filename
do
      if [[ "$sum" != "$old" ]] ; then
            old="$sum"
            continue
      fi
      rm -f "$filename"
          
done < filelist

Sponsored Links
    #3  
Old 03-20-2009
Ikon's Avatar
Ikon Ikon is offline Forum Advisor  
Computer Geek
 
Join Date: Jul 2008
Last Activity: 10 July 2013, 11:06 AM EDT
Location: Frederick, MD
Posts: 748
Thanks: 4
Thanked 10 Times in 9 Posts
Nice.. Thats ALOT faster, the directory has hundreds of files.

Thanks a bunch...
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
find duplicate records... again rleal Shell Programming and Scripting 4 01-28-2009 05:30 PM
How to find all duplicate rows using awk purvi Shell Programming and Scripting 7 08-21-2008 01:34 PM
Find duplicate value and create an ricky007 Shell Programming and Scripting 3 02-27-2008 03:47 PM
Find duplicate value comparing 2 files and create an output ricky007 Shell Programming and Scripting 2 02-26-2008 03:57 PM
how to find duplicate files with find ? umen Shell Programming and Scripting 8 09-04-2006 03:01 AM



All times are GMT -4. The time now is 03:01 AM.