How to compare 2 files column's more than 5?

03-30-2013

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Thank you sir

Sir here I am attaching file1.txt and file2.txt

file1.txt contains some information which is also available in file2.txt
that is column 1 to column 8 of file1.txt as reference unfortunately code which I posted in #1

giving result by using

Code:

$1

field only that is

Code:

awk  '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results

Code:

$0 of file2.txt followed by tab $9 of file1.txt

when I tried

Code:

awk  '
NR==FNR {A[$1,$2,$3,$4,$5,$6,$7,$8]=$9; next}
{B=A[$1,$2,$3,$4,$5,$6,$7,$8]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results

Code:

$0 of file2.txt followed by Not is file message

So I posted in thread that

Code:

at least 5 fields out of 8 if matches then print $9 field of file1.txt  with $0 of file2.txt

Because of this I was worried about, whether FS of both file need to be same or not in #3 post

file1.txt (3.5 KB)

file2.txt (134.1 KB)

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

03-30-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Let me paraphrase your request:

Line(n) means line in file(n)
For every line(2) of file2.txt, print it. Then you want to compare that line(2)'s $1 to every line(1)'s $1 in file1.txt. Same for $2 ... $8. If for the line(2) under consideration, in any line(1) , any 5 (no matter which) of those 8 fields match, print that line(1)'s $9 and stop the search, else print "not in file".

Is that correct? If so, why didn't you try and accept Don Cragun's fine proposal in post #2?

Last edited by RudiC; 03-30-2013 at 09:10 AM..

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-30-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Akshay Hegde

Code:

$1

field only that is

Code:

awk  '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results

Code:

$0 of file2.txt followed by tab $9 of file1.txt

when I tried

Code:

awk  '
NR==FNR {A[$1,$2,$3,$4,$5,$6,$7,$8]=$9; next}
{B=A[$1,$2,$3,$4,$5,$6,$7,$8]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt

this results

Code:

$0 of file2.txt followed by Not is file message

So I posted in thread that

Code:

at least 5 fields out of 8 if matches then print $9 field of file1.txt  with $0 of file2.txt

Because of this I was worried about, whether FS of both file need to be same or not in #3 post

I see that when I added comments to my awk script, I let a single quote slip in in a place where it keeps the script from working. I have corrected that problem in message #2 and #6.

You have now given us sample input files, but you still haven't clearly explained what the output is supposed to be. The script I provided does not find any lines that match the criteria I thought you were trying to use.

With the code that you had and the code I provided, it doesn't matter whether the field separator in your input files is a tab, one or more spaces, one or more spaces followed by a tab, or a tab followed by one or more spaces (all of which occur in your input files). And, in the statement:

Code:

print $0,B""?B:" Not in file "

setting OFS to a tab will only affect the separator that is used between $0 and Not in file ; it will not change the input field separators in $0 to tabs. If you want the output field separator to just be a tab between all output fields, you need to add that as a required transformation to be performed by your script.

We understand that the script you have shown us does not do what you want it to do. We do not understand what you do want it to do. Please show us the output you want to get from some of your input lines where you expect to get a match and where you do not expect to get a match. AND explain in English exactly what criteria is to be used to determine whether a line from file1.txt matches line from file2.txt.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

03-30-2013

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Sir I tried Don Cragun's post but I don't know why I am not getting result

when I tried this

Code:

awk  '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t"

and Don Cragun's code

Code:

awk '
FNR == NR {
        c = NR  # # of lines in 1st file
        for(i = 1; i <= 9; i++)
                a[c, i] = $i
        next
}
{       p = 0   # # of lines matched
        for(i = 1; i <= c; i++) {
                m = 1   # # of fields that must match
                for(j = 1; j <= 8 && m; j++)
                        if(a[i, j] == $j) m--
                if(m) continue
                # We matched enough fields; print this as a matched line.
                print $0, a[i, 9]
                p++
        }
        # If we did not find any matching lines, print the not found message
        if(p == 0) print $0, "Not -In file"
}' OFS="\t"

I got following result which I expect but if I compare only $1 my problem is there are some duplicates so I need minimum 5 column to be matched for my satisfaction.

Code:

52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    451.7    (4)    [01]    10.590    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    452.3    (4)    [01]    10.590    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    452.9    (4)    [01]    10.590    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    453.6    (4)    [01]    10.580    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    454.2    (4)    [01]    10.570    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    454.8    (4)    [01]    10.570    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    455.4    (4)    [01]    10.560    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    456.0    (4)    [01]    10.560    (5)    [01]    817    BAD
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    456.6    (4)    [01]    10.570    (5)    [01]    817    BAD

when I tried #2 with m value 2 I got following result

Code:

52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    607.5    (4)    [01]    9.080    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    608.1    (4)    [01]    9.100    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    608.7    (4)    [01]    9.090    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    609.3    (4)    [01]    9.100    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    609.9    (4)    [01]    9.090    (4)    [01]    817    Not -In file
52416     ZA  141863    2.613    90.408 1996 10 19 10.38  7377649 1224    610.5    (4)    [01]    9.090    (4)    [01]    817    Not -In file

I really don't know why I am getting wrong result with #2 actually file2 which I attached has all records which is available in file1

Code:

52416    2.613    90.408    1996    10    19    10.38    817    BAD

52416 is there in file1.txt

Last edited by Akshay Hegde; 03-30-2013 at 09:44 AM..

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

03-30-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

This is because e.g. "2.613" is $4 in file2 and $2 in file1, and so on.

Did you read my paraphrasing carefully? Then you should have objected and posted a corrected version. You don't want to match $1 to $1 etc., but $1 to ANY of $1 to $8, $2 to ANY of $1 to $8, and so on. And, would $8 be sufficient? The 817 ($8 in file1) is $18 in file2!

So - please sit back, look at your two files, think about what your output should be, and how to achieve it, and then give us a detailed specification to work on. Don Cragun put in quite some time to provide his solution just to find it is based on wrong specifications and, as a consequence, wrong conclusions and assumptions.

Last edited by RudiC; 03-30-2013 at 09:46 AM..

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-30-2013

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Sorry Sir I really don't have much idea about how awk actually interpret columns, till date I was thinking that $1 to any of $1 to $18.. $2 to any of $1 to $8

So then here is equivalent columns and please explain How awk interpret column while matching

Code:

$1 of file1 ==> $1 of file2
$2 of file1 ==> $4 of file2
$3 of file1 ==> $5 of file2
$4 of file1 ==> $6 of file2
$5 of file1 ==> $7 of file2
$6 of file1 ==> $8 of file2
$7 of file1 ==> $9 of file2
$8 of file1 ==> $18 of file2

if above 8 columns or at least 5 columns are matching
I want to print $9 index of that particular match of file1 with $0 of file2

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

03-30-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

A specification does not depend on "how awk actually interpret columns". Actually, it is not meant to call a tool to be used. It will define input and output, data formats, may help with the underlying logics applies, and MAY hint on the tool.
As an educational exercise, I propose we start over, and you give us a clean and detailed spec in plain English. (hint: look at posts #10, 11, 13)

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

How to compare 2 files column's more than 5?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Discussion started by: TestPractice

2. Shell Programming and Scripting

Compare 3rd column in 2 files

Discussion started by: kieranfoley

3. Shell Programming and Scripting

Compare two files based on column

Discussion started by: origon

4. Shell Programming and Scripting

Compare first column of 2 files and replace

Discussion started by: nua7

5. Shell Programming and Scripting

Compare 1 column in 2 files

Discussion started by: owwow14

6. Shell Programming and Scripting

Compare two files with different column entries..:-(

Discussion started by: emily

7. Shell Programming and Scripting

Compare Two Files(Column By Column) In Perl or shell

Discussion started by: kumar96877

8. Shell Programming and Scripting

Compare files column to column based on keys

Discussion started by: blackjack101

9. Shell Programming and Scripting

column compare of files

Discussion started by: aaysa123

10. Shell Programming and Scripting

Compare Column value from Two Different Files

Discussion started by: hanie123