Simple awk command to compare two files and print first difference


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
Simple awk command to compare two files and print first difference

Hello,

I have two text files, each with a single column,
file 1:
Code:
124152970
123899868
123476854
54258288
123117283

file 2:
Code:
124152970
123899868
54258288
123117283
122108330

I am trying to identify the value in red above which is the first value that doesn't match the second file. I need to print that value and exit.

At first I tried diff,

diff file1 file2 | head -n 2

This gives what I want, but there are multiple lines of output and so it was more steps to get the value into a bash variable, which is what I need.

I then tried awk,

awk ' NR==FNR { a[NR]=$0; next } !($0 in a){ print $1; exit } ' file2 file1

Note that the order of input files is reversed because I want the first line of file1 that does not match file2. This just prints the first line of file1. Even if it did work, I think that this just tells me that the value is, or is not, in the file, not if the lines match.

Code:
awk ' NR==FNR { a[NR]=$0; next } $0 != a[FNR] { print a[FNR]; exit } file1 file2

I am sure I could do a loop with read, but that would be slow.

This seems like a very simple task. Are there any suggestions?

LMHmedchem

Last edited by Scrutinizer; 04-26-2017 at 07:17 PM..
# 2  
How about
Code:
diff -y -b --suppress-common-lines file1 file2 | cut -f1 | head -1
123476854


Or, slightly adapting your own awk proposal:
Code:
awk ' NR==FNR { a[$0]; next } !($0 in a){ print $1; exit } ' file2 file1
123476854

This User Gave Thanks to RudiC For This Post:
# 3  
Quote:
Originally Posted by RudiC
How about
Code:
diff -y -b --suppress-common-lines file1 file2 | cut -f1 | head -1
123476854


Or, slightly adapting your own awk proposal:
Code:
awk ' NR==FNR { a[$0]; next } !($0 in a){ print $1; exit } ' file2 file1
123476854

It seems something like this would be correct,

awk ' NR==FNR { a[$0]; next } $0 != a[FNR] { print a[FNR]; exit } file1 file2'

but that doesn't do anything at all. Am I right that evaluating !($0 in a) looks for $0 anywhere in a[]? I am checking that the files match, so it matters that the value appears on the same line in both files, not that it appears anywhere.

LMHmedchem
# 4  
Code:
paste -d" " file1 file2 | awk '$1 != $2 {print $1; exit;}'

This User Gave Thanks to rdrtx1 For This Post:
# 5  
@OP, you second suggestion seems to work alright but you forgot the second quote:
Code:
awk ' NR==FNR { a[NR]=$0; next } $0 != a[FNR] { print a[FNR]; exit }' file1 file2

However, it would read the whole of file1 first and put it in memory..

Another approach you could try:
Code:
awk '{getline s<f} $0!=s{print; exit}' f=file2 file1

This User Gave Thanks to Scrutinizer For This Post:
# 6  
In the end, I did this based on the code posted by Scrutinizer,

error_record=$(awk '{getline s<f} $0!=s{print; exit}' f=file2 file1)

It seems like it will work well enough and was the fastest of the methods that worked.

This suggestion of RudiC also worked but was marginally slower.

error_record=$(diff -y -b --suppress-common-lines file1 file2 | cut -f1 | head -1)

By slower I mean 0m0.391s as opposed to 0m0.156s with the first method. Not enough difference to bother with but I guess you need some reason to pick a method.

The method suggested by rdrtx1 also worked but again was a bit slower,

error_record=$(paste -d" " file1 file2 | awk '$1 != $2 {print $1; exit;}')

My guess is that the two slower methods both made calls to more than one program and this is the origin of the difference.

I was not able to get any output from this, even though it looks correct,

awk ' NR==FNR { a[NR]=$0; next } $0 != a[FNR] { print a[FNR]; exit }' file1 file2

Don't know what the issue is there.

LMHmedchem

Last edited by LMHmedchem; 04-26-2017 at 10:15 PM..
This User Gave Thanks to LMHmedchem For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare two variables and print the difference

compare two variables and print the difference i have two variables X1=rac1,rac2 Y1=rac2,rac3 output=rac1,rac3 Use code tags to wrap code fragments or data samples. (1 Reply)
Discussion started by: jhonnyrip
1 Replies

2. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ... (2 Replies)
Discussion started by: justinjj
2 Replies

3. Shell Programming and Scripting

Compare two files and print using awk

I have 2 files: email_1.out 1 abc@yahoo.com 2 abc_1@yahoo.com 3 abc_2@yahoo.com data_1.out <tr> 1 MAIL # 1 TO src_1 </tr> <tr><td class="hcol">col_id</td> <td class="hcol">test_dt</td> <td class="hcol">user_type</td> <td class="hcol">ct</td></tr> <tr><td... (1 Reply)
Discussion started by: sol_nov
1 Replies

4. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3. FILE1 1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04 1467 55.27433... (1 Reply)
Discussion started by: jm4smtddd
1 Replies

5. Shell Programming and Scripting

Compare two files and output difference, by first field using awk.

It seems like a common task, but I haven't been able to find the solution. vitallog.txt 1310,John,Hancock 13211,Steven,Mills 122,Jane,Doe 138,Thoms,Doe 1500,Micheal,May vitalinfo.txt 12122,Jane,Thomas 122,Janes,Does 123,Paul,Kite **OUTPUT** vitalfiltered.txt 12122,Jane,Thomas... (2 Replies)
Discussion started by: charles33
2 Replies

6. Shell Programming and Scripting

Compare two columns in two files and print the difference

one file . . importing table employee 119 . . importing table jobs 1 2nd file . . importing table employee 120 . . importing table jobs 1 and would like... (2 Replies)
Discussion started by: jhonnyrip
2 Replies

7. Shell Programming and Scripting

awk to compare flat files and print output to another file

Hello, I am strugling from quite a some time to compare flat files with over 1 million records could anyone please help me. I want to compare two pipe delimited flat files, file1 with file2 and output the unmatched rows from file2 in file3 Sample File1: ... (9 Replies)
Discussion started by: suhaeb
9 Replies

8. Shell Programming and Scripting

Compare two files and print the two lines with difference

I have two files like this: #FILE 1 ABCD 4322 26485 JMTJ 5311 97248 XMPJ 4321 58978 #FILE 2 ABCD 4321 26485 JMTJ 5311 97248 XMPJ 4321 68978 What to do: Compare the two files and find those lines that doesn't match. And have a new file like this: #FILE 3 "from file 1" ABCD 4322 26485... (11 Replies)
Discussion started by: kingpeejay
11 Replies

9. Shell Programming and Scripting

awk to compare lines of two files and print output on screen

hey guys, I have two files both with two columns, I have already created an awk code to ignore certain lines (e.g lines that start with 963) as they wou ld begin with a certain string, however, the rest I have added together and calculated the average. At the moment the code also displays... (3 Replies)
Discussion started by: chlfc
3 Replies

10. Shell Programming and Scripting

to compare two files and to print the difference

suppose one file P1168S P2150L P85L Q597R R1097C Another file P2150L P85L Q597R R1097C R1379C R1587K Then output shud be R1379C R1587K thanks (5 Replies)
Discussion started by: cdfd123
5 Replies

Featured Tech Videos