Matching by key fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching by key fields
# 1  
Old 02-01-2009
Matching by key fields

I have a file (key.dat) that contains two columns:

AA|1234|
BB|567|
CC|8910|

I have another file (extract.dat) that contains some data:

SD|458|John|Smith|
AA|3345|Frank|Williams|
AA|1234|Bill|Garner|
BD|0098|Yu|Lin|
BB|567|Gail|Hansen|
CC|8910|Ken|Nielsen|

I want to compare the two files by the first two columns (inner join), and then print the contents of extract.dat (see below):

AA|1234|Bill|Garner|
BB|567|Gail|Hansen|
CC|8910|Ken|Nielsen|

Any help is most appreciated.
Thanks,

- CB
# 2  
Old 02-01-2009
Hi.

For the case posted, a modern fixed-string grep, fgrep, suffices:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate special case of join, grep used to extract.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) fgrep
set -o nounset
echo

FILE1=data1
FILE2=data2

echo " Data file $FILE1:"
cat $FILE1

echo
echo " Data file $FILE2:"
cat $FILE2

echo
echo " Results:"
fgrep -f $FILE1 $FILE2

exit 0

Producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.11-x1, i686
Distribution        : Xandros Desktop 3.0.3 Business
GNU bash 2.05b.0
fgrep (GNU grep) 2.5.1

 Data file data1:
AA|1234|
BB|567|
CC|8910|

 Data file data2:
SD|458|John|Smith|
AA|3345|Frank|Williams|
AA|1234|Bill|Garner|
BD|0098|Yu|Lin|
BB|567|Gail|Hansen|
CC|8910|Ken|Nielsen|

 Results:
AA|1234|Bill|Garner|
BB|567|Gail|Hansen|
CC|8910|Ken|Nielsen|

See man fgrep for details ... cheers, drl
# 3  
Old 02-01-2009
man grep
Code:
grep -f key.dat extract.dat

# 4  
Old 02-01-2009
Thanks danmero. This works very well for a small dataset. I wonder how well it will perform with 800 million records extract and a 300K records key.

Thanks,

- CB

Last edited by ChicagoBlues; 02-01-2009 at 10:55 AM..
# 5  
Old 02-01-2009
Use GNU awk (gawk), New awk (nawk)
or POSIX awk (/usr/xpg4/bin/awk):

Code:
awk -F\| '
NR == FNR { _[$1,$2]; next }
($1,$2) in _
' key.dat extract.dat

# 6  
Old 02-01-2009
Hi, CB.

You might benchmark these approaches to see what is best for your situation,

I would start with the smallest practical set to see what the overhead might be, then perhaps 1, 2, 5, and 10% of the data to see how it is progressing. Many complex solutions tend not to be linear. I think both will be somewhat memory intensive if you have 300K lines in your key.dat file.

Keep us posted ... cheers, drl

PS I used fgrep to avoid any interpretation of special characters in your patterns. The sample you provided used only "|" which is recognized by egrep, but I don't know how representative your sample was.

Last edited by drl; 02-01-2009 at 01:57 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Hi all I have two files I need to match record from first file and second file on column 1,8 and and output only match records on file1 File1: 020059801803180116130926800002090000800231000245204003160000000002000461OUNCE000000350000100152500BM01007W0000 ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. UNIX for Dummies Questions & Answers

File updation on matching key

I have input file like Input.dat with below content RRD 0Z91YUn000000Lk 9000100001 103020151117 STMT151117155527001 0000 2 000000 000004 RRD 0Z91YUn00000ysj 9000100001 103020151117 STMT151117155527001 0000 3 000000 000003 RRD 0Z91YUn00001vGh 9000100002... (12 Replies)
Discussion started by: PRAMOD 96
12 Replies

3. UNIX for Dummies Questions & Answers

awk - Print lines if only matching key is found

I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. Thanks a lot. Any help is appreciated. Script I am using: awk 'FNR == NR && ! /^]*$/ {... (9 Replies)
Discussion started by: High-T
9 Replies

4. Shell Programming and Scripting

Rsa public private key matching

Hi All, I have a requirement where i need to check if an rsa public key corresponds to a private key and hence return success or failure. Currently i am using the command diff <( ssh-keygen -y -e -f "$PRIVKEY" ) <( ssh-keygen -y -e -f "$PUBLICKEY" ) and its solving my purpose. This is in... (1 Reply)
Discussion started by: mritusmoi
1 Replies

5. Shell Programming and Scripting

Compare Fields from two text files using key columns

Hi All, I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values. Using Perl, I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6... (2 Replies)
Discussion started by: Sangtha
2 Replies

6. Linux

matching two fields

Hi I am having 2 fields and if f1=f2 i wanna print that line eg 1 2 1 3 1 9 2 2 3 5 9 9 In the abov eg. the highlighted lines shud be printed 2 2 9 9 Thanking u (3 Replies)
Discussion started by: binnybio
3 Replies

7. Shell Programming and Scripting

awk should output if one input file doesnt have matching key

nawk -F, 'FNR==NR{a= $3 ;next} $2 in a{print $1, 'Person',$2, a}' OFS=, filea fileb Input filea Input fileb output i am getting : (2 Replies)
Discussion started by: pinnacle
2 Replies

8. Shell Programming and Scripting

Perl function to sort a file based on key fields

Hi, I am new to PERL.I want to sort all the lines in a file based on 1,2 and 4th filelds. Can U suggest me a command/function in perl for this operation.. (5 Replies)
Discussion started by: karthikd214
5 Replies

9. Shell Programming and Scripting

matching 2 exact fields

Dear experts, I have a file1 that looks like 60127930928 2091 60129382039 2092 60126382937 2091 60128937928 2061 60127329389 2062 60123748730 2061 60128730293 2061 and file 2 that looks like 60127930928 2091 60129382039 2092 60126382937 2093 60128937928 2061 60127329389... (2 Replies)
Discussion started by: aismann
2 Replies

10. Shell Programming and Scripting

Fill the Key fields : Please help us

Hi .... we are having the below file .Column 1, Column 2 ,column 3 are key fields... In the below ...for 2 nd , 3 rd row the repeated key column is missing .... i want the new file to be populated with all missing key columns. ... (11 Replies)
Discussion started by: charandevu
11 Replies
Login or Register to Ask a Question