awk compare files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk compare files
# 1  
Old 01-20-2015
awk compare files

I have a below requirement and trying to compare the files using awk



File 1 - Already stored on a prev day
Code:
 id   | text       | email id 
---------------------------------
89564|this is line 1 | xyz@sample.txt
985384|this is line 2 | abc@sample.txt
657342|this is line 3 | def@sample.txt

File 2 - Arrived today
Code:
 id   | text       | email id 
---------------------------------
89564|this is line 4 | xyz@sample.txt
657342|this is line 3 | def@sample.txt
985384|this is line 2 | abc@sample.txt

Requested output

Code:
id   | text        | email id       | operation 
-------------------------------------------------
89564|this is line 4  | xyz@sample.txt | modified 
985384|this is line 2 | abc@sample.txt | deleted
657342|this is line 3 | def@sample.txt | inserted

We are trying to use hadoop but however because with pseudo database hive we don't have insert or update operation , hence we are trying to generate the operation code at the shell scripting .

I tried a code as below but it does not give me a desired ouput

Code:
 
 awk -F"|" 'NR==FNR{a[$1]=$2;next}{if (a[$1])print a[$1],$0;else print "inserted", $0;}' OFS="|" file1 file2


Last edited by Don Cragun; 01-20-2015 at 09:35 PM.. Reason: Fix CODE tags.
# 2  
Old 01-21-2015
I don't understand the logic behind your requested output.

For id 89564, the output line doesn't match either input line. (The text field in the output has an extra space that does not appear in File 2.)

For id 657342, both entries in the input files are identical. So why do you want to say it should be deleted???

For id 985384, both entries in the input files are identical. So why do you want to say it should be inserted???
# 3  
Old 01-21-2015
I am sorry for not stating my requirement properly , We are trying to accomplish incremental load of data by comparing two files

File 1
Code:
 id   | text       | email id 
---------------------------------
89564|this is line 1 | xyz@sample.txt
985384|this is line 14 | abc@sample.txt
657342|this is line 3 | def@sample.txt

File 2

Code:
 id   | text       | email id 
---------------------------------
89564|this is line 1 | xyz@sample.txt
985384|this is line 2 | abc@sample.txt


I would like output by comparing above two files which should display below results
Desired Output:
-------------------
Code:
 id   | text       | email id 
---------------------------------
89564|this is line 1 | xyz@sample.txt
985384|this is line 2 | abc@sample.txt | modified (Note :  second column got modified ) 
657342|this is line 3 | def@sample.txt | deleted

Here we are trying to compare two files and if we don't find a record in file 2 based on column 1 ( primary key ) we treat that entry as deleted and we generate a new file with its operational code(in this case deleted). Similarly we if we find record in both the files ( file 1 and file2) we would still have that record in file 3 ,and for modified if we see any column modifies in file 2 we add the entry as modified .

Last edited by Don Cragun; 01-21-2015 at 04:02 PM.. Reason: Add CODE tags.
# 4  
Old 01-21-2015
Please use code tags as required by forum rules!

Try
Code:
awk     'NR==FNR        {T[$1]=$0; next}
                        {printf "%s", $0}
         $1 in T        {if ($0 != T[$1]) printf "|modified."  
                         delete T[$1]
                         printf "\n"
                         next
                        }
                        {print  "|inserted"}  
         END            {for (t in T) print T[t], "deleted."} 
        ' FS="|" OFS="|" file1 file2
id | text | email id
---------------------------------
89564|this is line 1 | xyz@sample.txt
985384|this is line 2 | abc@sample.txt|modified.
885384|this is line 7 | abc@sample.txt|inserted
657342|this is line 3 | def@sample.txt|deleted.

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[awk] Compare two files

HI!! I am trying to compare two files using AWK but I have some problems. I need to count how many times letters are used in two texts. This is my script { long=length($0) for (i=1;i<=long;i++) { aux=substr($0,i,1) if ( aux != " " && aux != "" ) ... (7 Replies)
Discussion started by: ettore8888
7 Replies

2. Shell Programming and Scripting

Compare 2 files, awk maybe?

I have 2 files, file1: alfa numbers numbers vita numbers numbers gama numbers numbers delta numbers numbers epsilon numbers numbers zita numbers numbers ... file2: 'zita' keepnumbers keepnumbers keepnumbers 'gama' keepnumbers keepnumbers keepnumbers 'misc' ... (11 Replies)
Discussion started by: phaethon
11 Replies

3. HP-UX

Awk compare two files

Hi guys, I have 2 files: File1 ABC|2203|115.50 ABC|2288|328.12 ABC|2289|611.09 ABC|2290|698 DEF|1513|721.3 DEF|1514|40 DEF|1515|5 File2 ABC|2288|328.12 ABC|2289|666.08 ABC|2290|698.00 DEF|1513|721.30 (3 Replies)
Discussion started by: Eduardo Aceves
3 Replies

4. Shell Programming and Scripting

Compare files using awk

Please help me to compare two files and remove the items in file2 from file1 file 1:delimited using pipe(|) file1 00012|Description - 1|||||AA12345|1|AB12345|2|2012/06/03 AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04 11111|Description - 3|||||AP00012|1|AB12345|2|2012/06/03... (8 Replies)
Discussion started by: Mary James
8 Replies

5. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

6. Shell Programming and Scripting

Compare two files with awk

Hello, I have a script which extracts the values from a csv file when a specific date is entered : #!/bin/sh awk 'BEGIN{printf("Entrez la date : "); getline date < "-"} $0 ~ date {f=1;print;next} /^{2}\//{f=0} f' file1.csv This script gives me a number of lines with different values. ... (6 Replies)
Discussion started by: freyr
6 Replies

7. UNIX for Dummies Questions & Answers

Using AWK to compare 2 files

Hi How can I use awk to compare specific columns in 2 files and print the difference. I currently have this: BEGIN { OFS = FS = "," } NR == FNR { b = $3 next } { e = "" for (x in b) { if (match ($1, x)) { if (RSTART == 1 && RLENGTH > length(e)) { e=x (2 Replies)
Discussion started by: ladyAnne
2 Replies

8. Shell Programming and Scripting

compare two files using awk

Hi, I want to compare two files using awk and write an output based on if the records matched. Both the files are space delimitted. File A: 8351 00000000000636 2009044 -00001.000 8351 00000000000637 2009044 -00002.000 8351 00000000000638 2009044 -00001.000 8351 00000000000640... (7 Replies)
Discussion started by: gpaulose
7 Replies

9. Shell Programming and Scripting

Compare two files using awk

Hi. I'm new to awk and have searched for a solution to my problem, but haven't found the right answer yet. I have two files that look like this: file1 Delete,3105551234 Delete,3105551236 Delete,5625559876 Delete,5625556789 Delete,5625553456 Delete,5625551234 Delete,5625556956... (8 Replies)
Discussion started by: paul.o
8 Replies

10. Shell Programming and Scripting

awk compare 2 files

Hi i hope some awk gurus here can help me.. here is what i need i have 2 files: File1 152445 516532 405088.pdf 152445 516533 405089.pdf 152491 516668 405153.jpg 152491 520977 408779.jpg 152491 0 409265.pdf File2 516532 /tmp/MainStreet_Sum09_Front_FNL.pdf 516533... (9 Replies)
Discussion started by: kenray
9 Replies
Login or Register to Ask a Question