file1.txt contains some information which is also available in file2.txt
that is column 1 to column 8 of file1.txt as reference unfortunately code which I posted in #1
giving result by using
field only that is
this results
when I tried
this results
So I posted in thread that
Because of this I was worried about, whether FS of both file need to be same or not in #3 post
Line(n) means line in file(n)
For every line(2) of file2.txt, print it. Then you want to compare that line(2)'s $1 to every line(1)'s $1 in file1.txt. Same for $2 ... $8. If for the line(2) under consideration, in any line(1) , any 5 (no matter which) of those 8 fields match, print that line(1)'s $9 and stop the search, else print "not in file".
Is that correct? If so, why didn't you try and accept Don Cragun's fine proposal in post #2?
file1.txt contains some information which is also available in file2.txt
that is column 1 to column 8 of file1.txt as reference unfortunately code which I posted in #1
giving result by using
Code:
$1
field only that is
Code:
awk '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt
this results
Code:
$0 of file2.txt followed by tab $9 of file1.txt
when I tried
Code:
awk '
NR==FNR {A[$1,$2,$3,$4,$5,$6,$7,$8]=$9; next}
{B=A[$1,$2,$3,$4,$5,$6,$7,$8]; print $0,B""?B:" Not in file " }
' OFS="\t" file1.txt file2.txt
this results
Code:
$0 of file2.txt followed by Not is file message
So I posted in thread that
Code:
at least 5 fields out of 8 if matches then print $9 field of file1.txt with $0 of file2.txt
Because of this I was worried about, whether FS of both file need to be same or not in #3 post
I see that when I added comments to my awk script, I let a single quote slip in in a place where it keeps the script from working. I have corrected that problem in message #2 and #6.
You have now given us sample input files, but you still haven't clearly explained what the output is supposed to be. The script I provided does not find any lines that match the criteria I thought you were trying to use.
With the code that you had and the code I provided, it doesn't matter whether the field separator in your input files is a tab, one or more spaces, one or more spaces followed by a tab, or a tab followed by one or more spaces (all of which occur in your input files). And, in the statement:
Code:
print $0,B""?B:" Not in file "
setting OFS to a tab will only affect the separator that is used between $0 and Not in file ; it will not change the input field separators in $0 to tabs. If you want the output field separator to just be a tab between all output fields, you need to add that as a required transformation to be performed by your script.
We understand that the script you have shown us does not do what you want it to do. We do not understand what you do want it to do. Please show us the output you want to get from some of your input lines where you expect to get a match and where you do not expect to get a match. AND explain in English exactly what criteria is to be used to determine whether a line from file1.txt matches line from file2.txt.
Sir I tried Don Cragun's post but I don't know why I am not getting result
when I tried this
Code:
awk '
NR==FNR {A[$1]=$9; next}
{B=A[$1]; print $0,B""?B:" Not in file " }
' OFS="\t"
and Don Cragun's code
Code:
awk '
FNR == NR {
c = NR # # of lines in 1st file
for(i = 1; i <= 9; i++)
a[c, i] = $i
next
}
{ p = 0 # # of lines matched
for(i = 1; i <= c; i++) {
m = 1 # # of fields that must match
for(j = 1; j <= 8 && m; j++)
if(a[i, j] == $j) m--
if(m) continue
# We matched enough fields; print this as a matched line.
print $0, a[i, 9]
p++
}
# If we did not find any matching lines, print the not found message
if(p == 0) print $0, "Not -In file"
}' OFS="\t"
I got following result which I expect but if I compare only $1 my problem is there are some duplicates so I need minimum 5 column to be matched for my satisfaction.
Code:
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 451.7 (4) [01] 10.590 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 452.3 (4) [01] 10.590 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 452.9 (4) [01] 10.590 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 453.6 (4) [01] 10.580 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 454.2 (4) [01] 10.570 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 454.8 (4) [01] 10.570 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 455.4 (4) [01] 10.560 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 456.0 (4) [01] 10.560 (5) [01] 817 BAD
52416 ZA 141863 2.613 90.408 1996 10 19 10.38 7377649 1224 456.6 (4) [01] 10.570 (5) [01] 817 BAD
when I tried #2 with m value 2 I got following result
This is because e.g. "2.613" is $4 in file2 and $2 in file1, and so on.
Did you read my paraphrasing carefully? Then you should have objected and posted a corrected version. You don't want to match $1 to $1 etc., but $1 to ANY of $1 to $8, $2 to ANY of $1 to $8, and so on. And, would $8 be sufficient? The 817 ($8 in file1) is $18 in file2!
So - please sit back, look at your two files, think about what your output should be, and how to achieve it, and then give us a detailed specification to work on. Don Cragun put in quite some time to provide his solution just to find it is based on wrong specifications and, as a consequence, wrong conclusions and assumptions.
Sorry Sir I really don't have much idea about how awk actually interpret columns, till date I was thinking that $1 to any of $1 to $18.. $2 to any of $1 to $8
So then here is equivalent columns and please explain How awk interpret column while matching
Code:
$1 of file1 ==> $1 of file2
$2 of file1 ==> $4 of file2
$3 of file1 ==> $5 of file2
$4 of file1 ==> $6 of file2
$5 of file1 ==> $7 of file2
$6 of file1 ==> $8 of file2
$7 of file1 ==> $9 of file2
$8 of file1 ==> $18 of file2
if above 8 columns or at least 5 columns are matching
I want to print $9 index of that particular match of file1 with $0 of file2
A specification does not depend on "how awk actually interpret columns". Actually, it is not meant to call a tool to be used. It will define input and output, data formats, may help with the underlying logics applies, and MAY hint on the tool.
As an educational exercise, I propose we start over, and you give us a clean and detailed spec in plain English. (hint: look at posts #10, 11, 13)
Example:
I have files in below format
file 1:
zxc,133,joe@example.com
cst,222,xyz@example1.com
File 2 Contains:
hxd
hcd
jws
zxc
cst
File 1 has 50000 lines and file 2 has around 30000 lines :
Expected Output has to be :
hxd
hcd
jws (5 Replies)
I have the following 2 files.
File 1
08FB,000192602673,10000000c9a6b240
0121,000192602673,20000025b550101f
0121,000192602673,20000025b550100f
08FA,000192602673,10000000c9a6b240
File 2
18F2,000195702363,10000000c9a6b240
18F3,000195702363,10000000c9a6b240... (2 Replies)
Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2).
Example:
File1: ... (1 Reply)
Hi All,
I have 2 files in the following format :
File 1
S00999999|BHANU|TEST|007 JOHN DOE APT 999||VENGA HIGHWAY|MA|09566|SCO DUAL|20140201|20140331|20140401|20140630|20140327|
S00888888|BU|TES|009 JOHN DOE APT 909||SENGA HIGHWAY|MA|08566|SCO... (1 Reply)
Hi all,
I have two two-column tab-separated files with the following input:
inputA
dog A
dog B
cat A....
inputB
dog C
mouse A
output
dog
I need to compare the 1st column of each file and output those shared items.
What is the best unix solution for that? (5 Replies)
Dear All,
I would appreciate any help..At the moment, my work is stuck cos of my inability to resolve this issue.
Which is following:
I have two files with the arrngment like this
file-1
190645 12 3596022
190645 12 3764915
190645 16 3803981
190645 12 3854102
190645 12 4324593
190645... (12 Replies)
Hi,
I am writing a comparator script, which comapre two txt files(column by column)
below are the precondition of this comparator
1)columns of file are not seperated
Ex.
file1.txt
8888812341181892
1243548895685687
8945896789897789
1111111111111111
file2.txt
9578956789567897... (2 Replies)
Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files).
I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it... (2 Replies)
Hi
i want to compare files
a.txt
12345,23
34567,76
65456,10
13467,01
b.txt
12346,23
34567,76
23333,90
65456,10
13467,03
i want o/p in 3 files
common.txt
both have (2 Replies)
Hi, I need help to write a korn shell script to
1. Check and compare the first file contains single record from the /scp/inbox directory against the badpnt.dat file from the pnt/badfiles directory contains multiple records based on the fam_id column value start at position 38 to 47 from the... (7 Replies)