How to compare 2 files column's more than 5?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to compare 2 files column's more than 5?
# 1  
Old 03-28-2013
How to compare 2 files column's more than 5?

Hi All I am just trying to compare 2 file using column information using following code


Code:
awk '
NR==FNR {A[$1,$2,$3,$4,$5,$6,$7,$8]=$9; next}
{B=A[$1,$2,$3,$4,$5,$6,$7,$8]; print $0,B""?B:" Not -In file" }
' OFS="\t" file1 file2

if file1 matches with file2 then print
Code:
$9

content in file 1 along with file2
Code:
 $0

suppose if I keyed on only
Code:
$1

in array then I could able to see result but I want to compare at least 5 column out of 8

Those who know please help
# 2  
Old 03-29-2013
As usual, you did not give a clear specification of the output you want. If this isn't what you want, maybe it will be close enough for you to figure out how to fix it to meet your unstated requirements:
Code:
awk '
FNR == NR {
        c = NR  # # of lines in 1st file
        for(i = 1; i <= 9; i++)
                a[c, i] = $i
        next
}
{       p = 0   # # of lines matched
        for(i = 1; i <= c; i++) {
                m = 5   # # of fields that must match
                for(j = 1; j <= 8 && m; j++)
                        if(a[i, j] == $j) m--
                if(m) continue
                # We matched enough fields; print this as a matched line.
                print $0, a[i, 9]
                p++
        }
        # If we did not find any matching lines, print the not found message
        if(p == 0) print $0, "Not -In file"
}' OFS="\t" file1 file2

As always, if you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.

Last edited by Don Cragun; 03-30-2013 at 09:00 AM.. Reason: Remove ' in comment
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 03-29-2013
Sir my problem is very simple sir, I have 2 files 1 file is having records say for instance 10 columns and 25 rows, and in another file 10 columns and 15 rows with some update information in 9th column, usually I is to compare using code tagged in #1 post, but I didn't understand why my code is not working ...whether Sir FS of 2 files need to be same ? or any other specific reason
# 4  
Old 03-29-2013
What is not working with your code? While your problem may be very simple, your specification is not. Wildest guessing does not lead to an acceptable proposal. Please post input samples, desired output, and an explanation on how to get from A to B.
# 5  
Old 03-29-2013
Quote:
Originally Posted by Akshay Hegde
Sir my problem is very simple sir, I have 2 files 1 file is having records say for instance 10 columns and 25 rows, and in another file 10 columns and 15 rows with some update information in 9th column, usually I is to compare using code tagged in #1 post, but I didn't understand why my code is not working ...whether Sir FS of 2 files need to be same ? or any other specific reason
I have color coded the various requirements in the following description of how I interpreted what you requested. Following that is another copy of my script again with corresponding sections of the code highlighted using the same colors.

In your 1st example you had at least 9 fields in file1 (with no specified number of rows) and at least 8 fields in file2 (with no specified number of rows) and with the separator between fields unspecified. I interpreted your sample code and your statements to mean that if at least 5 fields out of the first 8 fields in a line in file2 match the corresponding fields in any line in file1, that line from file2 is to be printed followed by a tab followed by the contents of field 9 of the matched line from file1. Every line in file1 is supposed to be checked to see if there is a match for each line read from file2. If there are no matches in file1 for a line in file2, that line from file2 is to be printed followed by a tab followed by the text " Not -In file". (My awk script removed the first space in this string since I didn't see any need for a space following the tab separating this text from the original contents of the line from file2. You can easily put the space back, change the text in this message, or delete it entirely, if you want to.)

Code:
awk '
FNR == NR {
        c = NR  # # of lines in 1st file
        for(i = 1; i <= 9; i++)
                a[c, i] = $i
        next
}
{       p = 0   # # of lines matched
        for(i = 1; i <= c; i++) {
                m = 5   # # of fields that must match
                for(j = 1; j <= 8 && m; j++)
                        if(a[i, j] == $j) m--
                if(m) continue
                # We matched enough fields; print this as a matched line.
                print $0, a[i, 9]
                p++
        }
        # If we did not find any matching lines, print the not found message
        if(p == 0) print $0, "Not -In file"
}' OFS="\t" file1 file2

With all of the awk scripts I have provided for you in this forum in the past, and the descriptions I have provided on how they work, I am confident that you can easily modify this script to meet your changed requirements and produce the output you want. In this case you have gone from adding a field 9 from file1 to the end of matched lines in file2 to replacing field 9 in matched lines, matching at least 5 of the first 8 fields in each line to matching an unspecified number of fields in the first 10 fields, and specifying what is to happen when there is no match to saying nothing about that case. In this case you specify the number of lines in both files, but the script I provided doesn't care how many lines are in either file, although it could run out of memory if there is a HUGE amount of data in the 9 fields that have to be saved from each line in file1.

For the record, what you original code did was the following: If the first 8 fields in a line in file1 match every one of the corresponding first 8 fields (ignoring the field separators between those fields) print the line from file2 followed by a tab followed by field 9 from a matched line in file1. If no match is found, print the line from file2 followed by a tab followed by " Not -In file". That script made absolutely no attempt to accept a match of 5 out of 8 fields; it was only looking for a an 8 out of 8 match.

IF YOU WANT ANY HELP FROM ME IN THE FUTURE, YOU NEED TO EXPLICITLY STATE YOUR REQUIREMENTS AND PROVIDE SAMPLES IN YOUR FIRST MESSAGE RATHER THAN MAKING US GUESS AT WHAT YOU WANT DONE AND CHANGING THE REQUIREMENTS WITH EACH FOLLOW ON POST. Pardon me for yelling; but you must understand by now how frustrating it is to see this pattern of behavior repeated in every one of your threads.

Last edited by Don Cragun; 03-30-2013 at 09:01 AM.. Reason: Remove ' in comment
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 03-30-2013
Dear Don Cragun and RudyC,

This is to convey to you my sincere apologies for any inconvenience you may have experienced in past and present thread.

You can expect better and more appropriate behavior from me in the future. I have learned from this experience and understand that a certain level of refrain and professionalism is expected of me in the forum.

I recognize your dedication toward forum and your commitment. I hope our relationship is undamaged from my actions and that I can continue to learn and grow under your guidance.

Again, please accept my most genuine apologies and if there is anything you would like to discuss about this event, please feel free to discuss it with me.

Akshay Hegde
# 7  
Old 03-30-2013
Apologies accepted but not needed. Pls see and answer post #4.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

2. Shell Programming and Scripting

Compare 3rd column in 2 files

I have the following 2 files. File 1 08FB,000192602673,10000000c9a6b240 0121,000192602673,20000025b550101f 0121,000192602673,20000025b550100f 08FA,000192602673,10000000c9a6b240 File 2 18F2,000195702363,10000000c9a6b240 18F3,000195702363,10000000c9a6b240... (2 Replies)
Discussion started by: kieranfoley
2 Replies

3. Shell Programming and Scripting

Compare two files based on column

Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2). Example: File1: ... (1 Reply)
Discussion started by: origon
1 Replies

4. Shell Programming and Scripting

Compare first column of 2 files and replace

Hi All, I have 2 files in the following format : File 1 S00999999|BHANU|TEST|007 JOHN DOE APT 999||VENGA HIGHWAY|MA|09566|SCO DUAL|20140201|20140331|20140401|20140630|20140327| S00888888|BU|TES|009 JOHN DOE APT 909||SENGA HIGHWAY|MA|08566|SCO... (1 Reply)
Discussion started by: nua7
1 Replies

5. Shell Programming and Scripting

Compare 1 column in 2 files

Hi all, I have two two-column tab-separated files with the following input: inputA dog A dog B cat A.... inputB dog C mouse A output dog I need to compare the 1st column of each file and output those shared items. What is the best unix solution for that? (5 Replies)
Discussion started by: owwow14
5 Replies

6. Shell Programming and Scripting

Compare two files with different column entries..:-(

Dear All, I would appreciate any help..At the moment, my work is stuck cos of my inability to resolve this issue. Which is following: I have two files with the arrngment like this file-1 190645 12 3596022 190645 12 3764915 190645 16 3803981 190645 12 3854102 190645 12 4324593 190645... (12 Replies)
Discussion started by: emily
12 Replies

7. Shell Programming and Scripting

Compare Two Files(Column By Column) In Perl or shell

Hi, I am writing a comparator script, which comapre two txt files(column by column) below are the precondition of this comparator 1)columns of file are not seperated Ex. file1.txt 8888812341181892 1243548895685687 8945896789897789 1111111111111111 file2.txt 9578956789567897... (2 Replies)
Discussion started by: kumar96877
2 Replies

8. Shell Programming and Scripting

Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files). I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it... (2 Replies)
Discussion started by: blackjack101
2 Replies

9. Shell Programming and Scripting

column compare of files

Hi i want to compare files a.txt 12345,23 34567,76 65456,10 13467,01 b.txt 12346,23 34567,76 23333,90 65456,10 13467,03 i want o/p in 3 files common.txt both have (2 Replies)
Discussion started by: aaysa123
2 Replies

10. Shell Programming and Scripting

Compare Column value from Two Different Files

Hi, I need help to write a korn shell script to 1. Check and compare the first file contains single record from the /scp/inbox directory against the badpnt.dat file from the pnt/badfiles directory contains multiple records based on the fam_id column value start at position 38 to 47 from the... (7 Replies)
Discussion started by: hanie123
7 Replies
Login or Register to Ask a Question