Find Duplicate files, not by name


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find Duplicate files, not by name
# 1  
Find Duplicate files, not by name

I have a directory with images:
Code:
-rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg
-rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg
-rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg
-rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg
-rw-r--r-- 1 root root 31350 Mar 20 01:00 020109.010002.jpg
-rw-r--r-- 1 root root 31350 Mar 20 01:15 020109.011501.jpg
-rw-r--r-- 1 root root  8060 Mar 20 01:30 020109.013002.jpg
-rw-r--r-- 1 root root  8062 Mar 20 01:45 020109.014501.jpg

Some images are identical, but file names are different.

How can I write a script to find and delete the duplicates.

Is there a better way than this?

Code:
#!/bin/bash
DIR="/path/to/images"
echo "Starting:"
for file1 in ${DIR}/*.jpg; do
        for file2 in ${DIR}/*.jpg; do
                if [ $file1 != $file2 ]; then
                        DIFF=`diff "$file1" "$file2" -q`
                        if [ "${DIFF%% *}" != "Files" ]; then
                                echo "Same: $file1 $file2"
                                echo "Remove: $file2"
                                rm "$file1"
                                break
                        fi
                fi
        done
done
echo "Done."

# 2  
use md5 or another checksum or hash, I just used cksum:
Code:
cksum  *.jpg | sort -n > filelist

change the sort command if you use md5.

The files with identical checksums are identical. Read the file over before you go on to part 2 below:
Code:
old=""
while read sum lines filename
do
      if [[ "$sum" != "$old" ]] ; then
            old="$sum"
            continue
      fi
      rm -f "$filename"
          
done < filelist

# 3  
Nice.. Thats ALOT faster, the directory has hundreds of files.

Thanks a bunch...
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #288
Difficulty: Easy
In Linux, Logical Volume Manager (LVM) is a device mapper target that provides sound card I/O management for the Linux kernel.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

AIX find duplicate backup files

I would like find and delete old backup files in aix. How would I go about doing this? For example: server1_1-20-2020 server1_1-21-2020 server1_1-22-2020 server1_1-23-2020 server2_1-20-2020 server2_1-21-2020 server2_1-22-2020 server2_1-23-2020 How would I go about finding and... (3 Replies)
Discussion started by: cokedude
3 Replies

2. Shell Programming and Scripting

To Find Duplicate files using latest in Linux

I have tried the following code and with that i couldnt achieve what i want. #!/usr/bin/bash find ./ -type f \( -iname "*.xml" \) | sort -n > fileList sed -i '/\.\/fileList/d' fileList NAMEOFTHISFILE=$(echo $0|sed -e 's/\/()$*.^|/\\&/g') sed -i "/$NAMEOFTHISFILE/d"... (2 Replies)
Discussion started by: gold2k8
2 Replies

3. Shell Programming and Scripting

Find help in shell - that clears away duplicate files

I am so frustrated!!! I want a nice command that clears away duplicate files: find . -type f -regex '.*{1,3}\..*' | xargs -I## rm -v '##' should work in my opinion. But it finds nothing even though I have files that have the file name: Scooby-Doo-1.txt Himalaya-2.jpg Camping... (8 Replies)
Discussion started by: Mr.Glaurung
8 Replies

4. Shell Programming and Scripting

Find duplicate rows between files

Hi champs, I have one of the requirement, where I need to compare two files line by line and ignore duplicates. Note, I hav files in sorted order. I have tried using the comm command, but its not working for my scenario. Input file1 srv1..development..employee..empname,empid,empdesg... (1 Reply)
Discussion started by: Selva_2507
1 Replies

5. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

6. Shell Programming and Scripting

find duplicate string in many different files

I have more than 100 files like this: SVEAVLTGPYGYT 2 SVEGNFEETQY 10 SVELGQGYEQY 28 SVERTGTGYT 6 SVGLADYNEQF 21 SVGQGYEQY 32 SVKTVLGYEQF 2 SVNNEQF 12 SVRDGLTNSPLH 3 SVRRDREGLEQF 11 SVRTSGSYEQY 17 SVSVSGSPLQETQY 78 SVVHSTSPEAF 59 SVVPGNGYT 75 (4 Replies)
Discussion started by: xshang
4 Replies

7. Shell Programming and Scripting

Find duplicate files by file size

Hi! I want to find duplicate files (criteria: file size) in my download folder. I try it like this: find /Users/frodo/Downloads \! -type d -exec du {} \; | sort > /Users/frodo/Desktop/duplicates_1.txt; cut -f 1 /Users/frodo/Desktop/duplicates_1.txt | uniq -d | grep -hif -... (9 Replies)
Discussion started by: Dirk Einecke
9 Replies

8. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

9. Shell Programming and Scripting

Find duplicate value comparing 2 files and create an output

I need a perl script which will create an output file after comparing two diff file in a directory path: /export/home/abc/file1 /export/home/abc/file2 File Format: <IP>TAB<DeviceName><TAB>DESCRIPTIONS file1: 10.1.2.1.3<tab>abc123def<tab>xyz.mm1.ppp.... (2 Replies)
Discussion started by: ricky007
2 Replies

10. Shell Programming and Scripting

how to find duplicate files with find ?

hello all I like to make search on files , and the result need to be the files that are duplicated? (8 Replies)
Discussion started by: umen
8 Replies

Featured Tech Videos