Find duplicate files by file size


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find duplicate files by file size
# 8  
Old 04-02-2011
Quote:
Originally Posted by tene
Why dont you try this?
Go to your Downloads dir and run this.
Code:
ls -l | awk '$1!~/^d/{if(size[$5]!=""){ print}size[$5]=$8}'

$1 !~ /^d/ is an error-prone approach. Better to simply use /^-/.

Regards,
Alister
# 9  
Old 04-03-2011
@alister
In the first post he tried to list all except dir. So I did this.
# 10  
Old 04-03-2011
Here is a solution that uses cmp -s, it's a utility that's designed to compare binary files so will probably be much quicker that cksum and the like. Again only files of identical byte size are compared.

Code:
if [ $# -ne 1 ] || [ ! -d $1 ]
then
    echo "usage: $0 <directory>"
    exit 1
fi
find $1 -type f -ls | awk '
  $8 > 0 {
     gsub("\\\\ ", SUBSEP); F=$12; gsub(SUBSEP, " ", F); # Deal with space(s) in filename
     if($8 in sizes) {
         sizes[$8]=sizes[$8] SUBSEP F;
         dup[$8]++
     } else sizes[$8]=F
  }
  END {for(i in dup) print sizes[i] }' | while read
do
   # SUBSEP (34 Octal) between each filename that has same size
   # Change IFS to Load Array F with a group of 2 (or more) files 
   OIFS="$IFS"
   IFS=$(printf \\034)
   F=( $REPLY )
   IFS="$OIFS"
   i=0
   while [ $i -lt ${#F[@]} ]
   do
       let j=i+1
       while [ $j -lt ${#F[@]} ]
       do
           cmp -s "${F[i]}" "${F[j]}" &&
               echo "\"${F[i]}\"" and "\"${F[j]}\"" are identical
           let j=j+1
       done
       let i=i+1
    done
done


Last edited by Chubler_XL; 04-03-2011 at 06:57 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

List duplicate files based on Name and size

Hello, I have a huge directory (with millions of files) and need to find out duplicates based on BOTH file name and File size. I know fdupes but it calculates MD5 which is very time-consuming and especially it takes forever as I have millions of files. Can anyone please suggest a script or... (7 Replies)
Discussion started by: prvnrk
7 Replies

2. Shell Programming and Scripting

Find duplicate rows between files

Hi champs, I have one of the requirement, where I need to compare two files line by line and ignore duplicates. Note, I hav files in sorted order. I have tried using the comm command, but its not working for my scenario. Input file1 srv1..development..employee..empname,empid,empdesg... (1 Reply)
Discussion started by: Selva_2507
1 Replies

3. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

4. Shell Programming and Scripting

find duplicate string in many different files

I have more than 100 files like this: SVEAVLTGPYGYT 2 SVEGNFEETQY 10 SVELGQGYEQY 28 SVERTGTGYT 6 SVGLADYNEQF 21 SVGQGYEQY 32 SVKTVLGYEQF 2 SVNNEQF 12 SVRDGLTNSPLH 3 SVRRDREGLEQF 11 SVRTSGSYEQY 17 SVSVSGSPLQETQY 78 SVVHSTSPEAF 59 SVVPGNGYT 75 (4 Replies)
Discussion started by: xshang
4 Replies

5. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

6. Shell Programming and Scripting

Find file size difference in two files using awk

Hi, Could anyone help me to solve this problem? I have two files "f1" and "f2" having 2 fields in each, a) file size and b) file name. The data are almost same in both the files except for few and new additional lines. Now, I have to find out and print the output as, the difference in the... (3 Replies)
Discussion started by: royalibrahim
3 Replies

7. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

8. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

9. Solaris

command to find out total size of a specific file size (spread over the server)

hi all, in my server there are some specific application files which are spread through out the server... these are spread in folders..sub-folders..chid folders... please help me, how can i find the total size of these specific files in the server... (3 Replies)
Discussion started by: abhinov
3 Replies

10. Shell Programming and Scripting

how to find duplicate files with find ?

hello all I like to make search on files , and the result need to be the files that are duplicated? (8 Replies)
Discussion started by: umen
8 Replies
Login or Register to Ask a Question