How to compare 2 files column's more than 5?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to compare 2 files column's more than 5?
# 8  
Old 03-30-2013
Thank you sir

Sir here I am attaching file1.txt and file2.txt

file1.txt contains some information which is also available in file2.txt
that is column 1 to column 8 of file1.txt as reference unfortunately code which I posted in #1

giving result by using
Code:
$1

field only that is

Code:
awk  '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results
Code:
$0 of file2.txt followed by tab $9 of file1.txt

when I tried

Code:
awk  '
NR==FNR {A[$1,$2,$3,$4,$5,$6,$7,$8]=$9; next}
{B=A[$1,$2,$3,$4,$5,$6,$7,$8]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results
Code:
$0 of file2.txt followed by Not is file message

So I posted in thread that
Code:
at least 5 fields out of 8 if matches then print $9 field of file1.txt  with $0 of file2.txt

Because of this I was worried about, whether FS of both file need to be same or not in #3 post
# 9  
Old 03-30-2013
Let me paraphrase your request:

Line(n) means line in file(n)
For every line(2) of file2.txt, print it. Then you want to compare that line(2)'s $1 to every line(1)'s $1 in file1.txt. Same for $2 ... $8. If for the line(2) under consideration, in any line(1) , any 5 (no matter which) of those 8 fields match, print that line(1)'s $9 and stop the search, else print "not in file".

Is that correct? If so, why didn't you try and accept Don Cragun's fine proposal in post #2?

Last edited by RudiC; 03-30-2013 at 09:10 AM..
# 10  
Old 03-30-2013
Quote:
Originally Posted by Akshay Hegde
Thank you sir

Sir here I am attaching file1.txt and file2.txt

file1.txt contains some information which is also available in file2.txt
that is column 1 to column 8 of file1.txt as reference unfortunately code which I posted in #1

giving result by using
Code:
$1

field only that is

Code:
awk  '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results
Code:
$0 of file2.txt followed by tab $9 of file1.txt

when I tried

Code:
awk  '
NR==FNR {A[$1,$2,$3,$4,$5,$6,$7,$8]=$9; next}
{B=A[$1,$2,$3,$4,$5,$6,$7,$8]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results
Code:
$0 of file2.txt followed by Not is file message

So I posted in thread that
Code:
at least 5 fields out of 8 if matches then print $9 field of file1.txt  with $0 of file2.txt

Because of this I was worried about, whether FS of both file need to be same or not in #3 post
I see that when I added comments to my awk script, I let a single quote slip in in a place where it keeps the script from working. I have corrected that problem in message #2 and #6.

You have now given us sample input files, but you still haven't clearly explained what the output is supposed to be. The script I provided does not find any lines that match the criteria I thought you were trying to use.

With the code that you had and the code I provided, it doesn't matter whether the field separator in your input files is a tab, one or more spaces, one or more spaces followed by a tab, or a tab followed by one or more spaces (all of which occur in your input files). And, in the statement:
Code:
print $0,B""?B:" Not in file "

setting OFS to a tab will only affect the separator that is used between $0 and Not in file ; it will not change the input field separators in $0 to tabs. If you want the output field separator to just be a tab between all output fields, you need to add that as a required transformation to be performed by your script.

We understand that the script you have shown us does not do what you want it to do. We do not understand what you do want it to do. Please show us the output you want to get from some of your input lines where you expect to get a match and where you do not expect to get a match. AND explain in English exactly what criteria is to be used to determine whether a line from file1.txt matches line from file2.txt.
# 11  
Old 03-30-2013
Sir I tried Don Cragun's post but I don't know why I am not getting result

when I tried this
Code:
awk  '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t"

and Don Cragun's code

Code:
awk '
FNR == NR {
        c = NR  # # of lines in 1st file
        for(i = 1; i <= 9; i++)
                a[c, i] = $i
        next
}
{       p = 0   # # of lines matched
        for(i = 1; i <= c; i++) {
                m = 1   # # of fields that must match
                for(j = 1; j <= 8 && m; j++)
                        if(a[i, j] == $j) m--
                if(m) continue
                # We matched enough fields; print this as a matched line.
                print $0, a[i, 9]
                p++
        }
        # If we did not find any matching lines, print the not found message
        if(p == 0) print $0, "Not -In file"
}' OFS="\t"

I got following result which I expect but if I compare only $1 my problem is there are some duplicates so I need minimum 5 column to be matched for my satisfaction.

Code:
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    451.7    (4)    [01]    10.590    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    452.3    (4)    [01]    10.590    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    452.9    (4)    [01]    10.590    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    453.6    (4)    [01]    10.580    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    454.2    (4)    [01]    10.570    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    454.8    (4)    [01]    10.570    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    455.4    (4)    [01]    10.560    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    456.0    (4)    [01]    10.560    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    456.6    (4)    [01]    10.570    (5)    [01]    817    BAD

when I tried #2 with m value 2 I got following result

Code:
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    607.5    (4)    [01]    9.080    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    608.1    (4)    [01]    9.100    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    608.7    (4)    [01]    9.090    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    609.3    (4)    [01]    9.100    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    609.9    (4)    [01]    9.090    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    610.5    (4)    [01]    9.090    (4)    [01]    817    Not -In file

I really don't know why I am getting wrong result with #2 actually file2 which I attached has all records which is available in file1

Code:
52416    2.613    90.408    1996    10    19    10.38    817    BAD

52416 is there in file1.txt

Last edited by Akshay Hegde; 03-30-2013 at 09:44 AM..
# 12  
Old 03-30-2013
This is because e.g. "2.613" is $4 in file2 and $2 in file1, and so on.

Did you read my paraphrasing carefully? Then you should have objected and posted a corrected version. You don't want to match $1 to $1 etc., but $1 to ANY of $1 to $8, $2 to ANY of $1 to $8, and so on. And, would $8 be sufficient? The 817 ($8 in file1) is $18 in file2!

So - please sit back, look at your two files, think about what your output should be, and how to achieve it, and then give us a detailed specification to work on. Don Cragun put in quite some time to provide his solution just to find it is based on wrong specifications and, as a consequence, wrong conclusions and assumptions.

Last edited by RudiC; 03-30-2013 at 09:46 AM..
# 13  
Old 03-30-2013
Sorry Sir I really don't have much idea about how awk actually interpret columns, till date I was thinking that $1 to any of $1 to $18.. $2 to any of $1 to $8

So then here is equivalent columns and please explain How awk interpret column while matching

Code:
$1 of file1 ==> $1 of file2
$2 of file1 ==> $4 of file2
$3 of file1 ==> $5 of file2
$4 of file1 ==> $6 of file2
$5 of file1 ==> $7 of file2
$6 of file1 ==> $8 of file2
$7 of file1 ==> $9 of file2
$8 of file1 ==> $18 of file2

if above 8 columns or at least 5 columns are matching
I want to print $9 index of that particular match of file1 with $0 of file2
# 14  
Old 03-30-2013
A specification does not depend on "how awk actually interpret columns". Actually, it is not meant to call a tool to be used. It will define input and output, data formats, may help with the underlying logics applies, and MAY hint on the tool.
As an educational exercise, I propose we start over, and you give us a clean and detailed spec in plain English. (hint: look at posts #10, 11, 13)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

2. Shell Programming and Scripting

Compare 3rd column in 2 files

I have the following 2 files. File 1 08FB,000192602673,10000000c9a6b240 0121,000192602673,20000025b550101f 0121,000192602673,20000025b550100f 08FA,000192602673,10000000c9a6b240 File 2 18F2,000195702363,10000000c9a6b240 18F3,000195702363,10000000c9a6b240... (2 Replies)
Discussion started by: kieranfoley
2 Replies

3. Shell Programming and Scripting

Compare two files based on column

Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2). Example: File1: ... (1 Reply)
Discussion started by: origon
1 Replies

4. Shell Programming and Scripting

Compare first column of 2 files and replace

Hi All, I have 2 files in the following format : File 1 S00999999|BHANU|TEST|007 JOHN DOE APT 999||VENGA HIGHWAY|MA|09566|SCO DUAL|20140201|20140331|20140401|20140630|20140327| S00888888|BU|TES|009 JOHN DOE APT 909||SENGA HIGHWAY|MA|08566|SCO... (1 Reply)
Discussion started by: nua7
1 Replies

5. Shell Programming and Scripting

Compare 1 column in 2 files

Hi all, I have two two-column tab-separated files with the following input: inputA dog A dog B cat A.... inputB dog C mouse A output dog I need to compare the 1st column of each file and output those shared items. What is the best unix solution for that? (5 Replies)
Discussion started by: owwow14
5 Replies

6. Shell Programming and Scripting

Compare two files with different column entries..:-(

Dear All, I would appreciate any help..At the moment, my work is stuck cos of my inability to resolve this issue. Which is following: I have two files with the arrngment like this file-1 190645 12 3596022 190645 12 3764915 190645 16 3803981 190645 12 3854102 190645 12 4324593 190645... (12 Replies)
Discussion started by: emily
12 Replies

7. Shell Programming and Scripting

Compare Two Files(Column By Column) In Perl or shell

Hi, I am writing a comparator script, which comapre two txt files(column by column) below are the precondition of this comparator 1)columns of file are not seperated Ex. file1.txt 8888812341181892 1243548895685687 8945896789897789 1111111111111111 file2.txt 9578956789567897... (2 Replies)
Discussion started by: kumar96877
2 Replies

8. Shell Programming and Scripting

Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files). I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it... (2 Replies)
Discussion started by: blackjack101
2 Replies

9. Shell Programming and Scripting

column compare of files

Hi i want to compare files a.txt 12345,23 34567,76 65456,10 13467,01 b.txt 12346,23 34567,76 23333,90 65456,10 13467,03 i want o/p in 3 files common.txt both have (2 Replies)
Discussion started by: aaysa123
2 Replies

10. Shell Programming and Scripting

Compare Column value from Two Different Files

Hi, I need help to write a korn shell script to 1. Check and compare the first file contains single record from the /scp/inbox directory against the badpnt.dat file from the pnt/badfiles directory contains multiple records based on the fam_id column value start at position 38 to 47 from the... (7 Replies)
Discussion started by: hanie123
7 Replies
Login or Register to Ask a Question