Join 2 files based on certain column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Join 2 files based on certain column
# 1  
Old 09-07-2014
Join 2 files based on certain column

I have file input1.txt
Code:
11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14
11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14
11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14
11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14
11105|11|CHINA|City|EE|CAR|0|0|0|0|0|8016|2014-09-07 23:00:14
11105|11|CHINA|City|FF|TRAIN|0|0|1|-1|0|6209|2014-09-07 23:00:14

File input2.txt

Code:
11122|NIGERIA|AA|1811634377|R|AX|BY|1
11103|OTTAWA|BB|28126099855|R|AX|BY|2
11105|USA|CC|4811636667|D|AX|BY|3
11104|CANADA|DD|98126099100|D|AX|BY|4

I need to join files based on column 1, file input2 put into file input1 based on column 1

Expected output

Code:
11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14|OTTAWA|BB|28126099855|R|AX|BY|2
11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14|OTTAWA|BB|28126099855|R|AX|BY|2
11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14|CANADA|DD|98126099100|D|AX|BY|4
11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14|CANADA|DD|98126099100|D|AX|BY|4
11105|11|CHINA|City|EE|CAR|0|0|0|0|0|8016|2014-09-07 23:00:14|USA|CC|4811636667|D|AX|BY|3
11105|11|CHINA|City|FF|TRAIN|0|0|1|-1|0|6209|2014-09-07 23:00:14|USA|CC|4811636667|D|AX|BY|3


I did this
Code:
awk -F"|" 'NR==FNR{v=$1;$1="";A[v]=$0;next}{print $0" "A[$2]}' input2.txt input1.txt > output.txt

please help
# 2  
Old 09-07-2014
Try this:
Code:
awk -F\| 'NR==FNR{i=$1;sub($1,x); a[i]=$0; next} $1 in a{print $0 a[$1]}' input2.txt input1.txt

This User Gave Thanks to Franklin52 For This Post:
# 3  
Old 09-07-2014
thanks franklin

Can you explain step by step about this code? what does sub mean in this code?
# 4  
Old 09-07-2014
Reformattig Franklin52's code and adding comments:
Code:
awk -F\| '		# Set input field separator to "|".
fNR==FNR{		# For all lines in the 1st file given...
	i=$1		# Set i to the contents of the 1st field.
	sub($1,x)	# Substitute the 1st occurrence in the current line of
			# the contents of the 1st field with the contents of the
			# variable x.  Since x has not been set, the replaceent
			# string is an empty string.  This is dangerous if the
			# contents of the first field could contain any ERE
			# special characters.  I would have written this as:
			#	$1=""
			# or:	$1=x
	a[i]=$0		# Set a[i] to the contents of the modified current line
	next		# Skip remaining steps for this input line; start next
			# cycle.
}
$1 in a{print $0 a[$1]	# For all lines in the second file that have a 1st field
			# that matches the original first field in the 1st file,
			# print the current input line with the contents of the
			# corresponding line (except the 1st field) from the 1st
			# file added to the end of the line.
}' input2.txt input1.txt # Specify names of files to be processed.

Does this answer your questions?
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 09-07-2014
Quote:
Originally Posted by Don Cragun
This is dangerous if the contents of the first field could contain any ERE special characters. I would have written this as:
$1="" or $1=x
The sub function replaces only the field and preserves the field separators, $1="" or $1=x doesn't.
This User Gave Thanks to Franklin52 For This Post:
# 6  
Old 09-07-2014
Quote:
Originally Posted by Franklin52
The sub function replaces only the field and preserves the field separators, $1="" or $1=x doesn't.
Yes. If there is a chance that the first field could contain any characters that are special in an extended regular expression, you would need to not only change: sub($1,x) to: $1=x, but also to change the last line to:
Code:
}' OFS=\| input2.txt input1.txt

or add a BEGIN clause to set OFS.

As long as the 1st field is always just numeric characters, your code is perfectly safe.
These 2 Users Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

2. UNIX for Dummies Questions & Answers

Join files by second column

I have file input file1 1/1/2013 A 553.0763397 96 16582 1/1/2013 B 459.8333588 195 11992 1/2/2013 A 844.2973022 306 19555 1/2/2013 B 833.9300537 457 20165 1/3/2013 A 563.6917419 396 13879 1/3/2013 B 632.0749969 169 ... (1 Reply)
Discussion started by: radius
1 Replies

3. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

4. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

5. Shell Programming and Scripting

join two files based on one column

Hi All, I am trying to join to files based on one common column. Cat File1 ID HID Ab_1 23 Cd 45 df 22 Vv 33 Cat File2 ID pval Ab_1 0.3 Cd 10 Vv 0.0444 (3 Replies)
Discussion started by: newpro
3 Replies

6. Shell Programming and Scripting

join rows based on the column values

Hi, Please help me to convert the input file to a new one. input file: -------- 1231231231 3 A 4561223343 0 D 1231231231 1 A 1231231231 2 A 1231231231 4 D 7654343444 2 A 4561223343 1 D 4561223343 2 D the output should be: -------------------- 1231231231 3#1#2 A 4561223343 0 D... (3 Replies)
Discussion started by: vsachan
3 Replies

7. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

8. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files using first column

Hi, I'm trying to compare the first column of two files (tab or whitespace delimited, either way's fine, I`ve got both) and print the lines that are identical for the first column of both files. Something like this: File1 AAA 26 49 7 27 36 33 46 75 73 69 AAAAA 4 10 4 7 10 18 21... (2 Replies)
Discussion started by: vanesa1230
2 Replies

10. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies
Login or Register to Ask a Question