Merging two files based on matching columns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Merging two files based on matching columns
# 1  
Old 06-25-2015
Merging two files based on matching columns

Hi,

I am facing issues while accomplishing below task.

We have two files Test1.txt and Test2.txt. We have to match 1st column of Test1.txt file with 2nd column of Test2.txt and then merge 2nd file with the 1st file. In the output we should select column 1 and 2 from the 1st file and column 1 and 3 from the 2nd file. If there is no match with the 1st file then blank should be appended in output.
Code:
$ cat Test1.txt
Amar,Movies,12
Sanjay,Cricket,18
Rakesh,Football,35
Samit,Cricket,56
Sam,Songs,20
Ram,Books,40
$

Code:
$ cat Test2.txt
1,Samit,Service
2,Sam,DJ
3,Rakesh,Police
$

Desired Output:
Code:
Amar,Movies,,
Sanjay,Cricket,,
Rakesh,Football,3,Police
Samit,Cricket,1,Service
Sam,Songs,2,DJ
Ram,Books,,

I have tried to do this with awk but, could not get it.

Also, please advise how can we represent individual fields from two files?
For example, if I pass two files (Test1.txt and Test2.txt) to awk then, how will I differentiate or represent 1st/2nd field of the Test1.txt and 1st/2nd field of the Test2.txt?

Thanks in advance for your help in this regards.
# 2  
Old 06-25-2015
Hi, try:
Code:
awk 'NR==FNR{A[$2]=$1; B[$2]=$3; next} {$3=A[$1]; $4=B[$1]}1' FS=, OFS=, file2 file1

or
Code:
awk 'NR==FNR{A[$2]=$1 FS $3; next} {$3=$1 in A?A[$1]:FS}1' FS=, OFS=, file file1


Last edited by Scrutinizer; 06-25-2015 at 03:26 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 06-25-2015
Quote:
Originally Posted by Scrutinizer
Hi, try:
Code:
awk 'NR==FNR{A[$2]=$1; B[$2]=$3; next} {$3=A[$1]; $4=B[$1]}1' FS=, OFS=, file2 file1

or
Code:
awk 'NR==FNR{A[$2]=$1 FS $3; next} {$3=$1 in A?A[$1]:FS}1' FS=, OFS=, file file1

Thanks for your reply.
Can you please advise how this code works? Sorry to trouble you but, really getting confused as to which array represents which file and $3,$1 etc. refers which file? It will really help if you could explain above code.

Thanks again.
# 4  
Old 06-25-2015
Reformatting Scrutinizer's first script and adding comments:
Code:
awk '				# Invoke awk with this script...
NR == FNR {			# If the number of lines read from all files
				# (NR) is equal to the number of lines read
				# from the current input file (FNR) (which in
				# this case means if we are looking at a line
				# from file2)...
	A[$2] = $1		# Create an array (A[]) indexed by field 2 ($2)
				# on the current line whose value is set to
				# field 1 ($1) on the current line.
	B[$2] = $3		# Create an array (B[]) indexed by field 2 ($2)
				# on the current line whose value is set to
				# field 3 ($3) on the current line.
	next			# Skip to next input line and restart
				# processing at the top of htis script.
}
{				# If we got to here, we are looking at a line
				# from file1...
	$3 = A[$1]		# Set field 3 on this line to what was saved in
				# array A[] indexed by the contents of field 1
				# on this line (A[$1]).
	$4 = B[$1]		# Set field 4 on this line to what was saved in
				# array B[] indexed by the contents of field 1
				# on this line (B[$1]).
}
1				# Perform the default action (print the current
				# contents of the current line).
' FS=, OFS=, file2 file1	# End the script ('), set the input field
				# separator to a comma (FS=,), set the output
				# field separator to a comma (OFS=,) and name
				# the two files to be processed by this awk
				# script (file2 file1).

Hopefully, the above description will enable you to determine how the 2nd script does the same thing using one array instead of two. If you are still confused, tell us what still doesn't make sense and we'll try to explain it a different way.
These 3 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 06-25-2015
A few comments may lead you in the right direction but can't replace an in depth reading of e.g. man awk plus a lot of experimenting with samples.

Arrays don't represent files, nor do the field variables $3, $1 etc. The latter hold the single fields of the actual line read from the data stream which in turn can consist of several files. Most arrays are user defined and created on first reference.
So - the trick is to find out which file is operated on when. There's the FILENAME system variable that changes with the actual file. And - there's the NR and FNR variables. NR is the record (= line) No. in the entire stream, FNR the same but in the actual file, reset to 1 when the actual file changes. So NR==FNR only for the first file, and Scrutinizer uses this to load the A (and B) arrays. If this is no more true, we must have left the first file, and we can check if e.g. $1 (of the second file) can be found as an index into the A array, and, if yes, modify the actual line as requested.
These 3 Users Gave Thanks to RudiC For This Post:
# 6  
Old 06-26-2015
Thanks Scrutinizer, Don Cragun and RudiC.

I am getting desired output.
Code:
$ awk 'NR==FNR{A[$2]=$1; B[$2]=$3; next} {$3=A[$1]; $4=B[$1]}1' FS=, OFS=, Test2.txt Test1.txt
Amar,Movies,,
Sanjay,Cricket,,
Rakesh,Football,3,Police
Samit,Cricket,1,Service
Sam,Songs,2,DJ
Ram,Books,,

As suggested by RudiC, I have used FILENAME variable which helped me to understand how awk works.
Code:
$ awk 'NR==FNR{A[$2]=$1; B[$2]=$3; print FILENAME,$1,$2,$3,A[$2],B[$2]; next} {$3=A[$1];$4=B[$1];print FILENAME,$1,A[$1],B[$1]}' FS=, OFS=, Test2.txt Test1.txt
Test2.txt,1,Samit,Service,1,Service
Test2.txt,2,Sam,DJ,2,DJ
Test2.txt,3,Rakesh,Police,3,Police
Test1.txt,Amar,,
Test1.txt,Sanjay,,
Test1.txt,Rakesh,3,Police
Test1.txt,Samit,1,Service
Test1.txt,Sam,2,DJ
Test1.txt,Ram,,

Thanks a lot for your help.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Merging two file based on comparison of first columns

Respected Members. Hello. This is my first post in the forum. I will try to follow all the rules as prescribed by the forum. In case of non-compliance, I request you to kindly give me some more time to understand and abide by them. I am working on two files. I wish to merge the two files... (6 Replies)
Discussion started by: manojmalhotra
6 Replies

3. Shell Programming and Scripting

Merging two file based on comparison of first columns

Respected Members. Hello. This is my first post in the forum. I will try to follow all the rules as prescribed by the forum. In case of non-compliance, I request you to kindly give me some more time to understand and abide by them. I am working on two files. I wish to merge the two files... (1 Reply)
Discussion started by: manojmalhotra
1 Replies

4. Shell Programming and Scripting

Merging Multiple Columns between two files

Hello guys, I have 2 CSV files which goes like this: CSV1: Breaking.csv: UTF-8 "Name","Description","Occupation","Email" "Walter White","","Chemistry Teacher","w.w@bb.com" "Jessie Pinkman","","Junkie","j.p@bb.com" "Hank Schrader","","DEA Agent","h.s@bb.com" CSV2: Bad.csv... (7 Replies)
Discussion started by: jeffreybsu
7 Replies

5. Shell Programming and Scripting

Merging two special character separated files based on pattern matching

Hi. I have 2 files of below format. File1 AA~1~STEVE~3.1~4.1~5.1 AA~2~DANIEL~3.2~4.2~5.2 BB~3~STEVE~3.3~4.3~5.3 BB~4~TIM~3.4~4.4~5.4 File 2 AA~STEVE~AA STEVE WORKS at AUTO COMPANY AA~DANIEL~AA DANIEL IS A ELECTRICIAN BB~STEVE~BB STEVE IS A COOK I want to match 1st and 3rd... (2 Replies)
Discussion started by: crypto87
2 Replies

6. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

7. Shell Programming and Scripting

Matching and Merging csv data fields based on a common field

Dear List, I have a file of csv data which has a different line per compliance check per host. I do not want any omissions from this csv data file which looks like this: date,hostname,status,color,check 02-03-2012,COMP1,FAIL,Yellow,auth_pass_change... (3 Replies)
Discussion started by: landossa
3 Replies

8. UNIX for Dummies Questions & Answers

Merging two text files by two columns

Hi, I have two text files that I would like to merge/join. I would like to join them if the first columns of both text files match and the second column of the first text file matches the third column of the second text file. Example input: First file: 1334 10 0 0 1 5.2 1334 12 0 0 1 4.5... (4 Replies)
Discussion started by: evelibertine
4 Replies

9. UNIX for Dummies Questions & Answers

Merging two files based on two columns to make a third file

Hi there, I'm trying to merge two files and make a third file. However, two of the columns need to match exactly in both files AND I want everything from both files in the output if the two columns match in that row. First file looks like this: chr1 10001980 T A Second... (12 Replies)
Discussion started by: infiniteabyss
12 Replies

10. Shell Programming and Scripting

Merging columns from multiple files

Hello, I have a number of tab delimited data files consists of two columns. Like that: File1 800.000000 0.002744 799.000000 0.002517 798.000000 0.002836 797.000000 0.002553 FIle2 800.000000 0.000261 799.000000 0.000001 798.000000 0.000551 797.000000 0.000275 File3... (19 Replies)
Discussion started by: erden
19 Replies
Login or Register to Ask a Question