Complex match of numbers between 2 files awk script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Complex match of numbers between 2 files awk script
# 1  
Old 10-12-2012
Complex match of numbers between 2 files awk script

Hello to all,

I hope some awk guru could help me.

I have 2 input files:
File1: Is the complete database
File2: Contains some numbers which I want to compare

File1:
Code:
"NUMBERKEY","SERVICENAME","PARAMETERNAME","PARAMETERVALUE","ALTERNATENUMBERKEY"
"889780014068134","REGULAR","PROFILE","7","34952440000"
"889780009616087","REGULAR","PROFILE","5","34952440001"
"889780010060582","REGULAR","PROFILE","7","34952440003"
"889780021054589","REGULAR","PROFILE","7","34952440004"
"889780010806972","REGULAR","PROFILE","5","34952440007"
"889780022272662","REGULAR","PROFILE","7","34952440008"
"889780015883270","REGULAR","PROFILE","7","34952440009"
"889780018821619","REGULAR","PROFILE","5","34952440012"

File2:
Code:
"NUMBER1","NUMBER2","VALUE"
"34952440001","889780009616087",5
"34952440004","889780021054589",7
"34952440005","889780020514821",7
"34952440007","889780010806972",7
"34952440009","889780015883270",7

I would like:
- to know if the numbers in column1 of file2 are present in column5 of file1 and
get the value in column4 of file1 and compare it with value in column3 of file2
- If the number in file2 is present in file1, print the number, the value that has in file1 and value that has in file2
- If values are equal print "Yes, both values equal"
- If values are different print "Yes, both values different"
- If number in file2 is not found in file1 print number, value in file2 and "NOT_FOUND_in_File1"

Desired ouput:
Code:
NUMBER|VALUE_in_File1|VALUE_in_Fil2|FOUND?
34952440001|5|5|Yes, both values equal
34952440004|7|7|Yes, both values equal
34952440005||7|NOT_FOUND_in_File1
34952440007|5|7|Yes, both values different
34952440009|7|7|Yes, both values equal

PD: The real file1 has 5 million of lines and real file2 has hundred or some thousand of lines.

Thanks in advance for your help.

Last edited by Ophiuchus; 10-12-2012 at 02:13 AM..
# 2  
Old 10-12-2012
Hi

Assuming your files are file1 and file2:

Code:
$ sort -t, -k5 file1 > file11
$ mv file11 file1
$ join -t, -1 5 -2 1 -a 2 -o 2.1,1.4,2.3  file1 file2 | sed 's/"//g' | awk -F, '{if(!$2)$4="NOT_FOUND_in_File1";else $4=($2==$3 && (!$4))?"Yes, both values equal":"Yes, both values different";}1' OFS="|"
34952440001|5|5|Yes, both values equal
34952440004|7|7|Yes, both values equal
34952440005||7|NOT_FOUND_in_File1
34952440007|5|7|Yes, both values different
34952440009|7|7|Yes, both values equal

Guru.
# 3  
Old 10-12-2012
Code:
awk -F, 'FNR==NR{gsub("\"","",$0);a[$5]=$4;next}{gsub("\"","",$0);
if(a[$1]){if(a[$1] == $3){print $1,a[$1],$3,"Yes, both values equal"}
else{print $1,a[$1],$3,"Yes, both values different"}}
else{print $1,a[$1],$3,"NOT_FOUND_in_File1"}}' OFS="|" file1 file2

# 4  
Old 10-12-2012
@pamu: This approach of loading the entire file in memory might cause some performance issue in this scenario because the OP states that file1 has 5 million records.

Guru.
# 5  
Old 10-12-2012
Code:
awk -F, 'BEGIN{OFS="|"
print "NUMBER","VALUE_in_File1","VALUE_in_Fil2","FOUND?"}
FNR==NR{if(FNR>1){gsub(/"/,"");f2val[$1]=$3};next}
FNR>1{gsub(/"/,"")
if($5 in f2val)
{
 f1val[$5]=$4
 mess[$5]=(f1val[$5]==f2val[$5])?"Yes, both values equal":"Yes, both values different"
}}
END{
for(i in f2val)
{
 print i,f1val[i],f2val[i],mess[i]?mess[i]:"NOT_FOUND_in_File1"
}}' file2 file1


Last edited by elixir_sinari; 10-12-2012 at 03:14 AM..
# 6  
Old 10-12-2012
Hello,

Thank you Pamu and guruprasadpr.

I think I can try to understand the Pamu's script and the guruprasadpr´s script in the sed and awk part, but I don't understand this part:

Code:
sort -t, -k5 File_1.csv > File_11.csv
mv File_11.csv File_1.csv
join -t, -1 5 -2 1 -a 2 -o 2.1,1.4,2.3  File_1.csv File_2.csv

May you explain me please what does it mean the sort with -t -k5 parameters and the
join command with -t, -1 5 -2 1 -a 2 -o 2.1,1.4,2.3 parameters.

Thanks both again for the help.
# 7  
Old 10-12-2012
For the join command to work, the file has to be sorted based on the key on which you are trying to join. Since the key to join is 5th field of file1, it is -k5. t is for delimiter. For more on the join command options, refer to the example 4 here.

Guru.
This User Gave Thanks to guruprasadpr For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

How to use awk shell script to compare and match two files?

Basically, I have two files dupestest.txt 152,153 192,193,194 215,216 290,291 2279,2280 2282,2283haftest.txt 152,ABBOTS ROAD 153,ABBOTS ROAD 154,ABBOTS ROAD 155,ABBOTS ROAD 156,ABBOTS ROAD 157,ABBOTS ROADI want to find the numbers in dupestest.txt in haftest.txt... (4 Replies)
Discussion started by: amyc92
4 Replies

3. Shell Programming and Scripting

awk : match only the pattern string , not letters or numbers after that.

Hi Experts, I am finding difficulty to get exact match: file OPERATING_SYSTEM=HP-UX LOOPBACK_ADDRESS=127.0.0.1 INTERFACE_NAME="lan3" IP_ADDRESS="10.53.52.241" SUBNET_MASK="255.255.255.192" BROADCAST_ADDRESS="" INTERFACE_STATE="" DHCP_ENABLE=0 INTERFACE_NAME="lan3:1"... (6 Replies)
Discussion started by: rveri
6 Replies

4. Shell Programming and Scripting

Complex transpose awk script

Hello to all in forum, Maybe an awk expert could help me with this complex task for me. I have the input shown below and I would like to get the output as follow: - I would like the output separated by commas. - The header is fixed and will be the same always. - For the lines containing... (22 Replies)
Discussion started by: Ophiuchus
22 Replies

5. Shell Programming and Scripting

awk script (complex)

picked this up from another thread. echo 1st_file.csv; nawk -F, 'NR==FNR{a++;next} a{b++} END{for(i in b){if(b-1&&a!=b){print i";\t\t"b}else{print "NEW:"i";\t\t"b} } }' OFS=, 1st_file.csv *.csv | sort -r i need to use the above but with a slight modification.. 1.compare against 3 month... (25 Replies)
Discussion started by: slashbash
25 Replies

6. Shell Programming and Scripting

Complex overlap and naming of 2 input files - Awk

for every specific $1,$2 check the values ($2,$3) of their E ot I of input1 and overlap with input2. Specify names based on output. ####### if middle value is missing name them "SE" if first value is missing name them "AFE" if last value is missing name them "ALE" if 2 middle values are... (1 Reply)
Discussion started by: ruby_sgp
1 Replies

7. Shell Programming and Scripting

Match real numbers in AWK

I am looking for a better way to match real numbers within a specified tolerance range. My current code is as follows: if ($1 !~ /^CASE/) for(i=1;i in G;i++) if (G >= $5-1 && G <= $5+1) { print $1,$4,$5,J,G } else { print $1,"NO MATCH" } where $5 and G are... (3 Replies)
Discussion started by: cold_Que
3 Replies

8. Shell Programming and Scripting

awk merging files based on 2 complex conditions

1. if the 1st row IDs of input1 (ID1/ID2.....) is equal to any IDNames of input2 print all relevant values together as defined in the output. 2. A bit tricky part is IDno in the output. All we need to do is numbering same kind of letters as 1 (aa of ID1) and different letters as 2 (ab... (4 Replies)
Discussion started by: ruby_sgp
4 Replies

9. Shell Programming and Scripting

match range of different numbers by AWK

if the column1 and 2 in both files has same key (for example "a" and "a1") compare each first key value(a1 of a) of input2 (for example 1-4 or 65-69 not 70-100 or 44-40 etc) with all the values in input1. if the range of first key value in input2 is outof range in input1 values named it as out... (54 Replies)
Discussion started by: repinementer
54 Replies

10. Shell Programming and Scripting

match numbers (awk)

i would like to enter (user input) a bunch of numbers seperated by space: 10 15 20 25 and use awk to print out any lines in a file that have matching numbers so output is: 22 44 66 55 (10) 77 (20) (numbers 10 and 20 matched for example) is this possible in awk . im using gawk for... (5 Replies)
Discussion started by: tanku
5 Replies
Login or Register to Ask a Question