08-15-2017
Compare two big files for differences using Linux
Hello everybody
Looking for help in comparing two files in Linux(files are big 800MB each).
Example:-
File1 has below data
$ cat file1
5,6,3
2.1.4
1,1,1
8,9,1
File2 has below data
$ cat file2
5,6,3
8,9,8
1,2,1
2,1,4
Need Output as below
8,9,8
1,2,1
1,1,1
8,9,1
tried below awk command but it giving below output which is not correct
$ awk 'NR==FNR{a[$0]++;next} !a[$0]' file2 file1
2.1.4
1,1,1
8,9,1
$ cat vlookup.awk
FNR==NR{
a[$1]=$2
next
}
{ if ($1 in a) {print $1, a[$1]} else {print $1, "NA"} }
awk -f vlookup.awk file2 file1 | column -t
$ awk -f vlookup.awk file2 file1 | column -t
5,6,3
2.1.4 NA
1,1,1 NA
8,9,1 NA
treid below do while loop with grep command but its taking lot of time.
$ cat scp.sh
rm -f newfile.txt
while read line
do
line1=`grep -ie "${line}" file1`
if [ $? -ne 0 ] ; then
echo "$line" >> file2
fi
done <CUDB_REF
./scp.sh
8,9,8
1,2,1
This is correct but taking very long time for big file
Pls suggest better way which is fast.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I have a column in 2 different files which i want to compare, and output the results to a different file. The columns are in different positions in those 2 files.
File 1 the column is in position 10-15
File 2 the column is in position 15-20
Please advise
Thanks (1 Reply)
Discussion started by: samit_9999
1 Replies
2. Shell Programming and Scripting
Hi experts,
I'mvery new to shell scripting and learning it now
currently i am having a problem which may look easy to u :)
i have two files
File 1:
Start :Thu Nov 19 10:33:09 2009
ABCDGFSDJ.txt
APDemoNew.ppt
APDemoOutline.doc
ARDemoNew.ppt
ARDemoOutline.doc
File 2:
Start... (10 Replies)
Discussion started by: CelvinSaran
10 Replies
3. Shell Programming and Scripting
Hi
Hope you are having a great weeknd !! I had a question and need your expertise for this :
I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file... (5 Replies)
Discussion started by: newbie_8398
5 Replies
4. UNIX for Advanced & Expert Users
Hi ,
I have a requirement to compare 2 files which can contain 40 million or more records and more than 20 fields to compare .
Currently I am using awk scripting , and since awk has a memory issue, I am not able to process file more than 10 million records.
Any suggestions or pointers to... (7 Replies)
Discussion started by: rashmisb
7 Replies
5. Shell Programming and Scripting
Hi,
I need to compare the two files and list out difference between the two.
Please assist.
Best regards,
Vishal (2 Replies)
Discussion started by: Vishal_dba
2 Replies
6. UNIX for Beginners Questions & Answers
Hello everybody
Looking for help in comparing two files in Linux(files are big 800MB each).
Example:-
File1 has below data
$ cat file1
5,6,3
2.1.4
1,1,1
8,9,1
File2 has below data
$ cat file2
5,6,3
8,9,8
1,2,1
2,1,4 (8 Replies)
Discussion started by: shanul karim
8 Replies
7. UNIX for Beginners Questions & Answers
Hi,
I have 2 files abc.txt and bdc.txt.
I am using
$diff -y abc.txt bcd.txt -- compared the files side by side
I would like to write a Shell Script to cmpare the files side by side and print the results( which are not matched) in a side by side format and save the results in another... (10 Replies)
Discussion started by: vasuvv
10 Replies
8. Shell Programming and Scripting
Hi all,
i need help.
I have two csv files with a huge amount of data.
I need the first column of the first file, to be compared with the data of the second, to have at the end a file with the data not present in the second file.
Example
File1: (only one column)
profile_id
57036226... (11 Replies)
Discussion started by: SirMannu
11 Replies
9. Shell Programming and Scripting
Hey
im working on script that can compare 2 directory and check difference, then copy difference files in third diretory.
here is the story:
in folder one we have 12 subfolder and in each of them near 500 images hosted.
01 02 03 04 05 06 07 08 09 10 11 12
in folder 2 we have same subfolder... (2 Replies)
Discussion started by: nimafire
2 Replies
10. UNIX for Beginners Questions & Answers
I have
FILE 1 (This file has all master columns/headers)
A|B|C|D|E|F|G|H|STATUS
FILE 2
A|C|F|I|OFF_STATUS
3|4|5|4|Y
6|7|8|5|Y
Below command give me all headers of FILE 2 into array2.txt file
paste <(head -1 FILE2.txt | tr '|' '\n')>array2.txt
So I would like to compare... (2 Replies)
Discussion started by: jmadhams
2 Replies
DIFF(1) General Commands Manual DIFF(1)
NAME
diff - differential file comparator
SYNOPSIS
diff [ -efbh ] file1 file2
DESCRIPTION
Diff tells what lines must be changed in two files to bring them into agreement. If file1 (file2) is `-', the standard input is used. If
file1 (file2) is a directory, then a file in that directory whose file-name is the same as the file-name of file2 (file1) is used. The
normal output contains lines of these forms:
n1 a n3,n4
n1,n2 d n3
n1,n2 c n3,n4
These lines resemble ed commands to convert file1 into file2. The numbers after the letters pertain to file2. In fact, by exchanging `a'
for `d' and reading backward one may ascertain equally how to convert file2 into file1. As in ed, identical pairs where n1 = n2 or n3 = n4
are abbreviated as a single number.
Following each of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected
in the second file flagged by `>'.
The -b option causes trailing blanks (spaces and tabs) to be ignored and other strings of blanks to compare equal.
The -e option produces a script of a, c and d commands for the editor ed, which will recreate file2 from file1. The -f option produces a
similar script, not useful with ed, in the opposite order. In connection with -e, the following shell program may help maintain multiple
versions of a file. Only an ancestral file ($1) and a chain of version-to-version ed scripts ($2,$3,...) made by diff need be on hand. A
`latest version' appears on the standard output.
(shift; cat $*; echo '1,$p') | ed - $1
Except in rare circumstances, diff finds a smallest sufficient set of file differences.
Option -h does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of
unlimited length. Options -e and -f are unavailable with -h.
FILES
/tmp/d?????
/usr/lib/diffh for -h
SEE ALSO
cmp(1), comm(1), ed(1)
DIAGNOSTICS
Exit status is 0 for no differences, 1 for some, 2 for trouble.
BUGS
Editing scripts produced under the -e or -f option are naive about creating lines consisting of a single `.'.
DIFF(1)