Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

Find Duplicate files, not by name

Shell Programming and Scripting


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 03-20-2009
Ikon's Unix or Linux Image
Ikon Ikon is offline Forum Advisor  
Computer Geek
 
Join Date: Jul 2008
Last Activity: 15 January 2015, 10:57 AM EST
Location: Frederick, MD
Posts: 748
Thanks: 4
Thanked 11 Times in 10 Posts
Find Duplicate files, not by name

I have a directory with images:

Code:
-rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg
-rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg
-rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg
-rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg
-rw-r--r-- 1 root root 31350 Mar 20 01:00 020109.010002.jpg
-rw-r--r-- 1 root root 31350 Mar 20 01:15 020109.011501.jpg
-rw-r--r-- 1 root root  8060 Mar 20 01:30 020109.013002.jpg
-rw-r--r-- 1 root root  8062 Mar 20 01:45 020109.014501.jpg

Some images are identical, but file names are different.

How can I write a script to find and delete the duplicates.

Is there a better way than this?


Code:
#!/bin/bash
DIR="/path/to/images"
echo "Starting:"
for file1 in ${DIR}/*.jpg; do
        for file2 in ${DIR}/*.jpg; do
                if [ $file1 != $file2 ]; then
                        DIFF=`diff "$file1" "$file2" -q`
                        if [ "${DIFF%% *}" != "Files" ]; then
                                echo "Same: $file1 $file2"
                                echo "Remove: $file2"
                                rm "$file1"
                                break
                        fi
                fi
        done
done
echo "Done."

Sponsored Links
    #2  
Old Unix and Linux 03-20-2009
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 27 May 2015, 4:58 PM EDT
Location: NM
Posts: 10,454
Thanks: 337
Thanked 860 Times in 799 Posts
use md5 or another checksum or hash, I just used cksum:

Code:
cksum  *.jpg | sort -n > filelist

change the sort command if you use md5.

The files with identical checksums are identical. Read the file over before you go on to part 2 below:

Code:
old=""
while read sum lines filename
do
      if [[ "$sum" != "$old" ]] ; then
            old="$sum"
            continue
      fi
      rm -f "$filename"
          
done < filelist

Sponsored Links
    #3  
Old Unix and Linux 03-20-2009
Ikon's Unix or Linux Image
Ikon Ikon is offline Forum Advisor  
Computer Geek
 
Join Date: Jul 2008
Last Activity: 15 January 2015, 10:57 AM EST
Location: Frederick, MD
Posts: 748
Thanks: 4
Thanked 11 Times in 10 Posts
Nice.. Thats ALOT faster, the directory has hundreds of files.

Thanks a bunch...
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
find duplicate records... again rleal Shell Programming and Scripting 4 01-28-2009 05:30 PM
How to find all duplicate rows using awk purvi Shell Programming and Scripting 7 08-21-2008 01:34 PM
Find duplicate value and create an ricky007 Shell Programming and Scripting 3 02-27-2008 03:47 PM
Find duplicate value comparing 2 files and create an output ricky007 Shell Programming and Scripting 2 02-26-2008 03:57 PM
how to find duplicate files with find ? umen Shell Programming and Scripting 8 09-04-2006 03:01 AM



All times are GMT -4. The time now is 09:53 PM.