Need help in comparing multiple columns from two files.


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Need help in comparing multiple columns from two files.
# 1  
Old 10-23-2014
Linux Need help in comparing multiple columns from two files.

Hi all,

I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in achieving this.
Code:
 awk 'NR=FNR{a[NR]=$0;next} { split ( a[$1],x, " ") if ( ( $2 ~ x[1] ) and ( $5 ~ x[2] ) ) { print $0 > "yes.txt"; } else { print $0 > " no.txt" } ; } file2 file1 2>> log.txt

File1 :
Code:
1234 chennai kovai 1256 blr
2331 Madras Coimbatore 3234 Bangalore
5467 Chennai city 2456 ooty

File 2:
Code:
Madras Bangalore
Chennai blr

Result file 1 :- yes.txt
Code:
1234 chennai kovai 1256 blr

Result file 2:- no.txt
Code:
2331 Madras Coimbatore  3234 Bangalore 
5467 Chennai city 2456 ooty

Moderator's Comments:
Mod Comment Please use CODE tags for code, input, and output samples.

Last edited by Don Cragun; 10-23-2014 at 08:00 AM.. Reason: Fix tags.
# 2  
Old 10-23-2014
Please use code tags - not ICODE tags - for code AND DATA as required by forum rules!

I'm a bit surprised by your result files - Madras Bangalore should show up in yes.txt as it perfectly fits, and Chennai blr should not as the upper case C does not exist in file1.

Trying to adapt your code snippet as far as possible and correcting your input files, I came up with
Code:
awk     'NR==FNR                {a[$1,$2]; next}
         ($2 SUBSEP $5 in a)    {print $0 > "yes.txt"; next}
                                {print $0 > "no.txt" }
        ' file2 file1
cf *.txt
no.txt:
5467 Chennai city 2456 ooty
yes.txt:
1234 Chennai kovai 1256 blr
2331 Madras Coimbatore 3234 Bangalore

Pls check if you can live with that.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 10-23-2014
Quote:
Originally Posted by sivarajb
Hi all,

I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in achieving this. awk 'NR=FNR{a[NR]=$0;next} { split ( a[$1],x, " ") if ( ( $2 ~ x[1] ) and ( $5 ~ x[2] ) ) { print $0 > "yes.txt"; } else { print $0 > " no.txt" } ; } file2 file1 2>> log.txt

File1 :
1234 chennai kovai 1256 blr
2331 Madras Coimbatore 3234 Bangalore
5467 Chennai city 2456 ooty

File 2:
Madras Bangalore
Chennai blr

Result file 1 :- yes.txt
1234 chennai kovai 1256 blr

Result file 2:- no.txt
2331 Madras Coimbatore 3234 Bangalore
5467 Chennai city 2456 ooty
why 2331 Madras Coimbatore 3234 Bangalore is not in yes.txt file ? What kind of comparison you are doing case sensitive or insensitive?

If its typo try this similar to the one RudiC suggusted

Code:
akshay@nio:/tmp$ cat f1
1234 chennai kovai 1256 blr
2331 Madras Coimbatore 3234 Bangalore
5467 Chennai city 2456 ooty

Code:
akshay@nio:/tmp$ cat f2
Madras Bangalore
Chennai blr

Code:
akshay@nio:/tmp$ awk 'FNR==NR{A[tolower($1 FS $2)];next}{print >sprintf("%s.txt",(tolower($2 FS $5) in A ? "yes" : "no"))}' f2 f1

Code:
akshay@nio:/tmp$ cat yes.txt 
1234 chennai kovai 1256 blr
2331 Madras Coimbatore 3234 Bangalore

Code:
akshay@nio:/tmp$ cat no.txt 
5467 Chennai city 2456 ooty

This User Gave Thanks to Akshay Hegde For This Post:
# 4  
Old 10-23-2014
Untested, but seems close to what you're trying to do:
Code:
awk '
NR == FNR {
        a[tolower($1)] = tolower($2)
        next
}
tolower($2) in a && a[tolower($2)] == tolower($5) {
        print > "yes.txt"
        next
}
{       print > "no.txt"
}' file2 file1 2>> log.txt

Moderator's Comments:
Mod Comment This was moved here from a similar thread started by the same OP with slightly different input and output samples.

Last edited by Don Cragun; 10-23-2014 at 08:09 AM.. Reason: Add note.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 10-23-2014
Thank you rudiC and Aksay for such a quick reply..
Yes, the files are case sensitive and I did a typo mistake..

Now the actual issue over here is file 1 will have 20 million records and file 2 will have 30 thousand records... So will it cause performance issue if we go with this comparison?

Earlier I have achieved this with two while loops however it took hell lot of time. So now came to awk part.. Actually the if condition will be checking like below..
Code:
    if ( ( $2 ~ x[1] ) and ( $5 ~ x[2] ) )  or (( $2 ~ x[2] ) and ( $5 ~ x[1] )) { print $0 > "yes.txt"; } else { print $0 > " no.txt" } ; } file2 file1 2>> log.txt

Will this can be implemented there

---------- Post updated at 04:46 PM ---------- Previous update was at 04:33 PM ----------

Hi don,
Thank you...

Will this gives me a performance issues with changed if condition and huge volume of files.

Last edited by sivarajb; 10-23-2014 at 08:10 AM..
# 6  
Old 10-23-2014
The suggestion I provided was first posted in your other thread where the example you provided seemed to require case insensitive matching.

Now that you have clearly specified that you want case sensitive matching, RudiC's suggestion seems to be what you need (assuming you don't really want a space in the filename no.txt). Although the sample code you provided used ~ for matching instead of testing for equality; so I'm not sure what you mean by match. If file2 contains the line:
Code:
York  oo

should that match the line:
Code:
123 Yorkshire x y boot

in file1 in addition to matching the line:
Code:
456 York a b oo

? If so, RudiC's suggestion won't work, and anything we come up with will be significantly slower.

You say that you have code that works, but runs too slow; but you haven't shown us any code that would work at all. So, it is hard to guess at how many orders of magnitude faster RudiC's suggestion will run than your current code. If you have a shell script with a nested loop that is executing awk 600 billion times (30,000 * 20,000,000) and you want exact matches on fields (rather than substring matches), I wouldn't be surprised if RudiC's suggestion will run faster by a factor of well over a billion.

Why haven't you tried his code to see how well it works for you???
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 10-23-2014
  1. The elegant solution is to use tools that make delimited files act like SQL tables and do an SQL join.
  2. You can scan one file putting the concatenated keys into an associative array (awk, bash, ksh, etc.), then process the second file looking up the keys in the array. Associative arrays are hash lookup tables, keyed with strings.
  3. A more UNIX flavored solution is to cut out the columns to compare into two files or bash/ksh pipe streams '<(...)', sort them unique and match them with 'comm'. Now you have the key values to filter the input file. Sort the input file on the same columns so matching is a simple merge.
  4. If you create modified input files with the keys copied or moved on the front and then a delimiter and sort them, you can use 'join'. Since join only handles one join column, you need a different delimiter to make it join on multiple columns. If concatenating columns, make sure each column has values formatted all to the same length.
Make sure $LC_ALL=C during sort for binary order.

Last edited by Scrutinizer; 10-24-2014 at 06:07 AM..
This User Gave Thanks to DGPickett For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing multiple columns using awk

Hello All; I have two files with below conditions: 1. Entries in file A is missing in file B (primary is field 1) 2. Entries in file B is missing in file A (primary is field 1) 3. Field 1 is present in both files but Field 2 is different. Example Content: File A ... (4 Replies)
Discussion started by: mystition
4 Replies

2. Shell Programming and Scripting

awk arrays comparing multiple columns across two files.

Hi, I'm trying to use awk arrays to compare values across two files based on multiple columns. I've attempted to load file 2 into an array and compare with values in file 1, but success has been absent. If anyone has any suggestions (and I'm not even sure if my script so far is on the right lines)... (4 Replies)
Discussion started by: hubleo
4 Replies

3. Shell Programming and Scripting

Comparing two columns from two different files

Hi, I have a single-column file1 having records like: 00AB01/11 43TG22/00 78RC09/34 ...... ...... and a second file , file 2 having two columns like 78RC09/34 1 45FD11/11 2 00AB01/11 3 43TG22/00 4 ...... ...... (8 Replies)
Discussion started by: amarn
8 Replies

4. Shell Programming and Scripting

comparing two columns from two different files

Hello, I have two files as 1.txt and 2.txt with number as columns. 1.txt 0 53.7988 1 -30.0859 2 20.1632 3 14.2135 4 14.6366 5 -37.6258 . . . 31608 -8.57333 31609 -2.58554 31610 -24.2857 2.txt (1 Reply)
Discussion started by: AKD
1 Replies

5. UNIX for Dummies Questions & Answers

Comparing columns in two files

Hi, I have two files. File1.txt has 2 columns and looks like: 458739 122345 4456 122657 34200 122600 File2.txt has many columns with column 1 the same as column2 of File1.txt, but with lot more rows: 122786 abcdefg user1@email 122778 uuhjeufh user2@email... (1 Reply)
Discussion started by: ursaan
1 Replies

6. Shell Programming and Scripting

comparing the values of repeated keys in multiple columns

Hi Guyz The 1st column of the input file has repeated keys like x,y and z. The ist task is if the 1st column has unique key (say x) and then need to consider 4th column, if it is + symbol then subtract 2nd column value with 3rd column value (we will get 2(10-8)) or if it is - symbol subtract 3rd... (3 Replies)
Discussion started by: repinementer
3 Replies

7. UNIX for Dummies Questions & Answers

Comparing 2 columns from 2 files

Hi, I have two files with the same number of columns. Basically I want to print the 2 columns that match between the two files. File1 looks like this: dr12 12 6 abn dr14 12 7 abn File2 looks something like this: dr12 12 8 abn dr12 14 7 abn So basically if the first... (1 Reply)
Discussion started by: kylle345
1 Replies

8. Shell Programming and Scripting

comparing 2 columns from 2 files

Hey, I have 2 files that have a name and then a number: File 1: dog 21 dog 24 cat 33 cat 27 dog 76 cat 65 File 2: dog 109 dog 248 cat 323 cat 207 cat 66 (2 Replies)
Discussion started by: dcfargo
2 Replies

9. Shell Programming and Scripting

Comparing Columns of two FIles

Dear all, I have two files in UNIX File1 and File2 as below: File1: 1,1234,.,67.897,,0 1,4134,.,87.97,,4 0,1564,.,97.8,,1 File2: 2,8798,.,67.897,,0 2,8879,.,77.97,,4 0,1564,.,97.8,,1 I want to do the following: (1) Make sure that both the files have equal number of columns and if... (4 Replies)
Discussion started by: ggopal
4 Replies

10. UNIX for Advanced & Expert Users

Comparing Columns of two FIles

Dear all, I have two files in UNIX File1 and File2 as below: File1: 1,1234,.,67.897,,0 1,4134,.,87.97,,4 0,1564,.,97.8,,1 File2: 2,8798,.,67.897,,0 2,8879,.,77.97,,4 0,1564,.,97.8,,1 I want to do the following: (1) Make sure that both the files have equal number of columns and if... (1 Reply)
Discussion started by: ggopal
1 Replies
Login or Register to Ask a Question