08-15-2017
Compare two big files for differences using Linux
Hello everybody
Looking for help in comparing two files in Linux(files are big 800MB each).
Example:-
File1 has below data
$ cat file1
5,6,3
2.1.4
1,1,1
8,9,1
File2 has below data
$ cat file2
5,6,3
8,9,8
1,2,1
2,1,4
Need Output as below
8,9,8
1,2,1
1,1,1
8,9,1
tried below awk command but it giving below output which is not correct
$ awk 'NR==FNR{a[$0]++;next} !a[$0]' file2 file1
2.1.4
1,1,1
8,9,1
$ cat vlookup.awk
FNR==NR{
a[$1]=$2
next
}
{ if ($1 in a) {print $1, a[$1]} else {print $1, "NA"} }
awk -f vlookup.awk file2 file1 | column -t
$ awk -f vlookup.awk file2 file1 | column -t
5,6,3
2.1.4 NA
1,1,1 NA
8,9,1 NA
treid below do while loop with grep command but its taking lot of time.
$ cat scp.sh
rm -f newfile.txt
while read line
do
line1=`grep -ie "${line}" file1`
if [ $? -ne 0 ] ; then
echo "$line" >> file2
fi
done <CUDB_REF
./scp.sh
8,9,8
1,2,1
This is correct but taking very long time for big file
Pls suggest better way which is fast.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I have a column in 2 different files which i want to compare, and output the results to a different file. The columns are in different positions in those 2 files.
File 1 the column is in position 10-15
File 2 the column is in position 15-20
Please advise
Thanks (1 Reply)
Discussion started by: samit_9999
1 Replies
2. Shell Programming and Scripting
Hi experts,
I'mvery new to shell scripting and learning it now
currently i am having a problem which may look easy to u :)
i have two files
File 1:
Start :Thu Nov 19 10:33:09 2009
ABCDGFSDJ.txt
APDemoNew.ppt
APDemoOutline.doc
ARDemoNew.ppt
ARDemoOutline.doc
File 2:
Start... (10 Replies)
Discussion started by: CelvinSaran
10 Replies
3. Shell Programming and Scripting
Hi
Hope you are having a great weeknd !! I had a question and need your expertise for this :
I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file... (5 Replies)
Discussion started by: newbie_8398
5 Replies
4. UNIX for Advanced & Expert Users
Hi ,
I have a requirement to compare 2 files which can contain 40 million or more records and more than 20 fields to compare .
Currently I am using awk scripting , and since awk has a memory issue, I am not able to process file more than 10 million records.
Any suggestions or pointers to... (7 Replies)
Discussion started by: rashmisb
7 Replies
5. Shell Programming and Scripting
Hi,
I need to compare the two files and list out difference between the two.
Please assist.
Best regards,
Vishal (2 Replies)
Discussion started by: Vishal_dba
2 Replies
6. UNIX for Beginners Questions & Answers
Hello everybody
Looking for help in comparing two files in Linux(files are big 800MB each).
Example:-
File1 has below data
$ cat file1
5,6,3
2.1.4
1,1,1
8,9,1
File2 has below data
$ cat file2
5,6,3
8,9,8
1,2,1
2,1,4 (8 Replies)
Discussion started by: shanul karim
8 Replies
7. UNIX for Beginners Questions & Answers
Hi,
I have 2 files abc.txt and bdc.txt.
I am using
$diff -y abc.txt bcd.txt -- compared the files side by side
I would like to write a Shell Script to cmpare the files side by side and print the results( which are not matched) in a side by side format and save the results in another... (10 Replies)
Discussion started by: vasuvv
10 Replies
8. Shell Programming and Scripting
Hi all,
i need help.
I have two csv files with a huge amount of data.
I need the first column of the first file, to be compared with the data of the second, to have at the end a file with the data not present in the second file.
Example
File1: (only one column)
profile_id
57036226... (11 Replies)
Discussion started by: SirMannu
11 Replies
9. Shell Programming and Scripting
Hey
im working on script that can compare 2 directory and check difference, then copy difference files in third diretory.
here is the story:
in folder one we have 12 subfolder and in each of them near 500 images hosted.
01 02 03 04 05 06 07 08 09 10 11 12
in folder 2 we have same subfolder... (2 Replies)
Discussion started by: nimafire
2 Replies
10. UNIX for Beginners Questions & Answers
I have
FILE 1 (This file has all master columns/headers)
A|B|C|D|E|F|G|H|STATUS
FILE 2
A|C|F|I|OFF_STATUS
3|4|5|4|Y
6|7|8|5|Y
Below command give me all headers of FILE 2 into array2.txt file
paste <(head -1 FILE2.txt | tr '|' '\n')>array2.txt
So I would like to compare... (2 Replies)
Discussion started by: jmadhams
2 Replies
OD(1) FSF OD(1)
NAME
od - dump files in octal and other formats
SYNOPSIS
od [OPTION]... [FILE]...
od --traditional [FILE] [[+]OFFSET [[+]LABEL]]
DESCRIPTION
Write an unambiguous representation, octal bytes by default, of FILE to standard output. With more than one FILE argument, concatenate
them in the listed order to form the input. With no FILE, or when FILE is -, read standard input.
All arguments to long options are mandatory for short options.
-A, --address-radix=RADIX
decide how file offsets are printed
-j, --skip-bytes=BYTES
skip BYTES input bytes first
-N, --read-bytes=BYTES
limit dump to BYTES input bytes
-s, --strings[=BYTES]
output strings of at least BYTES graphic chars
-t, --format=TYPE
select output format or formats
-v, --output-duplicates
do not use * to mark line suppression
-w, --width[=BYTES]
output BYTES bytes per output line
--traditional
accept arguments in traditional form
--help display this help and exit
--version
output version information and exit
Traditional format specifications may be intermixed; they accumulate:
-a same as -t a, select named characters
-b same as -t oC, select octal bytes
-c same as -t c, select ASCII characters or backslash escapes
-d same as -t u2, select unsigned decimal shorts
-f same as -t fF, select floats
-h same as -t x2, select hexadecimal shorts
-i same as -t d2, select decimal shorts
-l same as -t d4, select decimal longs
-o same as -t o2, select octal shorts
-x same as -t x2, select hexadecimal shorts
For older syntax (second call format), OFFSET means -j OFFSET. LABEL is the pseudo-address at first byte printed, incremented when dump is
progressing. For OFFSET and LABEL, a 0x or 0X prefix indicates hexadecimal, suffixes may be . for octal and b for multiply by 512.
TYPE is made up of one or more of these specifications:
a named character
c ASCII character or backslash escape
d[SIZE]
signed decimal, SIZE bytes per integer
f[SIZE]
floating point, SIZE bytes per integer
o[SIZE]
octal, SIZE bytes per integer
u[SIZE]
unsigned decimal, SIZE bytes per integer
x[SIZE]
hexadecimal, SIZE bytes per integer
SIZE is a number. For TYPE in doux, SIZE may also be C for sizeof(char), S for sizeof(short), I for sizeof(int) or L for sizeof(long). If
TYPE is f, SIZE may also be F for sizeof(float), D for sizeof(double) or L for sizeof(long double).
RADIX is d for decimal, o for octal, x for hexadecimal or n for none. BYTES is hexadecimal with 0x or 0X prefix, it is multiplied by 512
with b suffix, by 1024 with k and by 1048576 with m. Adding a z suffix to any type adds a display of printable characters to the end of
each line of output. --string without a number implies 3. --width without a number implies 32. By default, od uses -A o -t d2 -w 16.
AUTHOR
Written by Jim Meyering.
REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.
COPYRIGHT
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU-
LAR PURPOSE.
SEE ALSO
The full documentation for od is maintained as a Texinfo manual. If the info and od programs are properly installed at your site, the com-
mand
info od
should give you access to the complete manual.
od (coreutils) 4.5.3 February 2003 OD(1)