Sponsored Content
Top Forums Shell Programming and Scripting Search compare and determine duplicate files Post 302508689 by Chubler_XL on Monday 28th of March 2011 05:50:43 PM
Old 03-28-2011
I'm not aware of any packages that do it, how about this script to find identical files:

Code:
if [ $# -ne 1 ]
then
    echo "usage: $0 <directory>"
    exit 1
fi
find $1 -type f -ls | awk '$7 > 0 { if($7 in sizes) { sizes[$7]=sizes[$7] SUBSEP $11; dup[$7]++} else sizes[$7]=$11 } END {for(i in dup) print sizes[i] }' | while read
do
   OIFS="$IFS"
   IFS=$(printf \\034)
   F=( $REPLY )
   IFS="$OIFS"
   i=0
   while [ $i -lt ${#F[@]} ]
   do
       let j=i+1
       while [ $j -lt ${#F[@]} ]
       do
            if cmp -s "${F[i]}" "${F[j]}"
            then
                echo ${F[i]} and ${F[j]} are identical
            fi
            let j=j+1
       done
       let i=i+1
   done
done

It does a find dir -type f -ls and stores filename in an awk array with size as the index. If a file with the same size is found the size is written into the dup array. At the end all duplicate sizes are output in a SUBSEP list.

This list is read into the F array and cmp with the -s flag (ie no output, just exit status) is used to compare files - this command will stop comparing as soon as the first difference is found which is better than calculating CRCs for each file.

Note: if files A B C are identical you will get output
A is identical to B
A is identical to C
B is identical to C
This can be fixed with more logic, but I don't consider it an issue.

Last edited by Chubler_XL; 03-28-2011 at 07:20 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Compare and Remove duplicate lines from txt

Hello, I am a total linux newbie and I can't seem to find a solution to this little problem. I have two text files with a huge list of URLS. Let's call them file1.txt and file2.txt What I want to do is grab an URL from file2.txt, search file1.txt for the URL and if found, delete it from... (11 Replies)
Discussion started by: rmarcano
11 Replies

2. Shell Programming and Scripting

compare fields in a file with duplicate records

Hi: I've been searching the net but didnt find a clue. I have a file in which, for some records, some fields coincide. I want to compare one (or more) of the dissimilar fields and retain the one record that fulfills a certain condition. For example, on this file: 99 TR 1991 5 06 ... (1 Reply)
Discussion started by: rleal
1 Replies

3. Shell Programming and Scripting

How to search & compare paragraphs between two files

Hello Guys, Greetings to All. I am stuck in my work here today while trying to comapre paragraphs between two files, I need your help on urgent basis, without your inputs I can not proceed. Kindly find some time to answer my question, I'll be grateful to you for ever. My detailed issue is as... (10 Replies)
Discussion started by: NARESH1302
10 Replies

4. Shell Programming and Scripting

compare two files and search keyword and print output

You have two files to compare by searching keyword from one file into another file File A 23 >pp_ANSWER 24 >aa hello 25 >jau head wear 66 >jss oops 872 >aqq olps ploww oww sss 722 >GG_KILLER ..... large files File B Beta done KILLER John Mayor calix meyers ... (5 Replies)
Discussion started by: cdfd123
5 Replies

5. Shell Programming and Scripting

Search duplicate field and replace one of them with new value

Dear All, I have file with 4 columns: 1 AA 0 21 2 BB 0 31 3 AA 0 21 4 CC 0 41 I would like to find the duplicate record based on column 2 and replace the 4th column of the duplicate by a new value. So, the output will be: 1 AA 0 21 2 BB 0 31 3 AA 0 -21 4 CC 0 41 Any suggestions... (3 Replies)
Discussion started by: ezhil01
3 Replies

6. Shell Programming and Scripting

Search and compare files from two paths

Hi All, I have a 2 path, one with oldfile path in which has several sub folders,each sub folders contains a config file(basically text file), likewise there will be another newfile path which will have sub folders, each sub folders contains a config file. Need to read files from oldfile... (6 Replies)
Discussion started by: Optimus81
6 Replies

7. UNIX for Dummies Questions & Answers

Search for string in a file then compare it with excel files entry

All, i have a file text.log: cover6 cover3 cover2 cover4 other file is abc.log as : 0 0 1 0 Then I have a excel file result.xls that contains: Name Path Pass cover2 cover3 cover6 cover4 (1 Reply)
Discussion started by: Anamika08
1 Replies

8. Shell Programming and Scripting

Search pattern on logfile and search for day/dates and skip duplicate lines if any

Hi, I've written a script to search for an Oracle ORA- error on a log file, print that line and the .trc file associated with it as well as the dateline of when I assumed the error occured. In most it is the first dateline previous to the error. Unfortunately, this is not a fool proof script.... (2 Replies)
Discussion started by: newbie_01
2 Replies

9. Shell Programming and Scripting

To search duplicate sequence in file

Hi, I want to search only duplicate sequence number in file e.g 4757610 4757610 should display only duplicate sequence number in file. file contain is: 4757610 6zE:EXPNL ORDER_PRIORITY='30600022004757610' ORDER_IDENTIFIER='4257771056' MM_ASK_VOLUME='273' MM_ASK_PRICE='1033.0000' m='GBX'... (5 Replies)
Discussion started by: ashfaque
5 Replies

10. Shell Programming and Scripting

Shell script to compare two files for duplicate..??

Hi , I had a requirement to compare two files whether the two files are same or different .... like(files contaisn of two columns each) file1.txt 121343432213 1234 64564564646 2345 343423424234 2456 file2.txt 121343432213 1234 64564564646 2345 31231313123 3455 how to... (2 Replies)
Discussion started by: hemanthsaikumar
2 Replies
CMP(1)							    BSD General Commands Manual 						    CMP(1)

NAME
cmp -- compare two files SYNOPSIS
cmp [-l | -s | -x] [-hz] file1 file2 [skip1 [skip2]] DESCRIPTION
The cmp utility compares two files of any type and writes the results to the standard output. By default, cmp is silent if the files are the same; if they differ, the byte and line number at which the first difference occurred is reported. Bytes and lines are numbered beginning with one. The following options are available: -h Do not follow symbolic links. -l Print the byte number (decimal) and the differing byte values (octal) for each difference. -s Print nothing for differing files; return exit status only. -x Like -l but prints in hexadecimal and using zero as index for the first byte in the files. -z For regular files compare file sizes first, and fail the comparison if they are not equal. The optional arguments skip1 and skip2 are the byte offsets from the beginning of file1 and file2, respectively, where the comparison will begin. The offset is decimal by default, but may be expressed as a hexadecimal or octal value by preceding it with a leading ``0x'' or ``0''. EXIT STATUS
The cmp utility exits with one of the following values: 0 The files are identical. 1 The files are different; this includes the case where one file is identical to the first part of the other. In the latter case, if the -s option has not been specified, cmp writes to standard error that EOF was reached in the shorter file (before any differences were found). >1 An error occurred. SEE ALSO
diff(1), diff3(1) STANDARDS
The cmp utility is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible. The -h, -x, and -z options are extensions to the standard. HISTORY
A cmp command appeared in Version 1 AT&T UNIX. BSD
November 18, 2013 BSD
All times are GMT -4. The time now is 04:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy