Find matched patterns in a column of 2 files with different size and merge them


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find matched patterns in a column of 2 files with different size and merge them
# 8  
Old 05-28-2014
ok, below is input1 & input2 that i append with new data

input1
Code:
Name	Seq_ID	NewID	Scores
PTR     GHT321  1.21.6    400
KLK     PPTh74  1.28.6.1  395.6
MT1	A0QZX3	1.65	  277.4
IVO	A0QZX3	1.65	  244.5
HPO	A0QZX3	1.65	  240.5
RgP	A0Q3PP	5.32	  241.0
GX1	LPSZ3S	96.1	  216.9
MEL	LPSS3X	4.23	  204.1
LDD     LPSS3X  4.23      100.2

input2
Code:
Fac	AddName  NewID	    Details	Type	 Cond
1	Garp	  1.65      navy1	 grt     alive
1	Garp	  1.67      admiral      ggt     alive
4	Luffy	  4.23      Pirate1      hr      survive
4       Luffy     4.25      Pirate2      hhhh    survive
9	Ace	  96.1      Brother	 hdsm    dead
1.21	Shank	  1.21.6    Hero1	 KT      alive
1.28.6	Shank	  1.28.6.1  warlord      PP      alive

The output should be:-

Code:
Name	Seq_ID	NewID	 Scores  Fac	AddName     Details	Type	 Cond
MT1	A0QZX3	1.65	  277.4  1	Garp	    navy1	 grt     alive
IVO	A0QZX3	1.65	  244.5  1	Garp	    navy1	 grt     alive
HPO	A0QZX3	1.65	  240.5  1	Garp	    navy1	 grt     alive
GX1	LPSZ3S	96.1	  216.9  9	Ace	    Brother	 hdsm    dead
MEL	LPSS3X	4.23	  204.1  4	Luffy	    Pirate1      hr      survive
LDD     LPSS3X  4.23      100.2  4	Luffy	    Pirate1      hr      survive
PTR     GHT321  1.21.6    400    1.21	Shank       Hero1        KT      alive
KLK     PPTh74  1.28.6.1  395.6  1.28.6	Shank       warlord      PP      alive

as u can see, there are two rows that has Fac values quite similar to NewID (AddName = Shank). This is the reason why the given codes not working. Thanks

Last edited by redse171; 05-28-2014 at 04:03 PM.. Reason: wrong values..
# 9  
Old 05-28-2014
Quote:
Originally Posted by redse171
ok, below is input1 & input2 that i append with new data

input1
Code:
Name	Seq_ID	NewID	Scores
PTR     GHT321  1.21.6    400
KLK     PPTh74  1.28.6.1  395.6
MT1	A0QZX3	1.65	  277.4
IVO	A0QZX3	1.65	  244.5
HPO	A0QZX3	1.65	  240.5
RgP	A0Q3PP	5.32	  241.0
GX1	LPSZ3S	96.1	  216.9
MEL	LPSS3X	4.23	  204.1
LDD     LPSS3X  4.23      100.2

input2
Code:
Fac	AddName  NewID	    Details	Type	 Cond
1	Garp	  1.65      navy1	 grt     alive
1	Garp	  1.67      admiral      ggt     alive
4	Luffy	  4.23      Pirate1      hr      survive
4       Luffy     4.25      Pirate2      hhhh    survive
9	Ace	  96.1      Brother	 hdsm    dead
1.21	Shank	  1.21.1    Hero1	 KT      alive
1.28.6	Shank	  1.28.6.1  warlord      PP      alive

The output should be:-

Code:
Name	Seq_ID	NewID	 Scores  Fac	AddName     Details	Type	 Cond
MT1	A0QZX3	1.65	  277.4  1	Garp	    navy1	 grt     alive
IVO	A0QZX3	1.65	  244.5  1	Garp	    navy1	 grt     alive
HPO	A0QZX3	1.65	  240.5  1	Garp	    navy1	 grt     alive
GX1	LPSZ3S	96.1	  216.9  9	Ace	    Brother	 hdsm    dead
MEL	LPSS3X	4.23	  204.1  4	Luffy	    Pirate1      hr      survive
LDD     LPSS3X  4.23      100.2  4	Luffy	    Pirate1      hr      survive
PTR     GHT321  1.21.6    400    1.21	Shank       Hero1        KT      alive
KLK     PPTh74  1.28.6.1  395.6  1.28.6	Shank       warlord      PP      alive

as u can see, there are two rows that has Fac values quite similar to NewID (AddName = Shank). This is the reason why the given codes not working. Thanks
No surprise, please note 1.21.6 is not equal to 1.21.1, so one line is missing, we can convert it to numeric by adding zero.

small trick like this

Code:
$ awk 'FNR==NR{s=$3; $3=""; A[s+0]=$0;next}($3+0 in A){print $0,A[$3+0]}' OFS="\t" input2 input1

Code:
Name	Seq_ID	NewID	Scores	Fac	AddName		Details	Type	Cond
PTR     GHT321  1.21.6    400	1.21	Shank		Hero1	KT	alive
KLK     PPTh74  1.28.6.1  395.6	1.28.6	Shank		warlord	PP	alive
MT1	A0QZX3	1.65	  277.4	1	Garp		navy1	grt	alive
IVO	A0QZX3	1.65	  244.5	1	Garp		navy1	grt	alive
HPO	A0QZX3	1.65	  240.5	1	Garp		navy1	grt	alive
GX1	LPSZ3S	96.1	  216.9	9	Ace		Brother	hdsm	dead
MEL	LPSS3X	4.23	  204.1	4	Luffy		Pirate1	hr	survive
LDD     LPSS3X  4.23      100.2	4	Luffy		Pirate1	hr	survive

---------- Post updated at 01:40 AM ---------- Previous update was at 01:27 AM ----------

I can see you have modified your input just now, so previous solution should work, there is no need of adding 0 to index, will treat them as string..

Code:
$ awk 'FNR==NR{s=$3; $3=""; A[s]=$0;next}($3 in A){print $0,A[$3]}' OFS="\t" input2 input1

This User Gave Thanks to Akshay Hegde For This Post:
# 10  
Old 05-28-2014
The codes work great!. thanks. It takes me some times to check as i have thousands of data. Thanks a million for your kind help Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

2. Shell Programming and Scripting

Find all matched lines from two files

Hello, everyone I have two files like this: File 1: A B C D E FFile 2:A B 1 A C 2 A K 3 B A 4 D E 3 W X 2 A B 2I want to print all lines (file2) that the first two columns are consist of elements from file1. So, what I expected is : A B 1 A B 2 (2 Replies)
Discussion started by: nengcheng
2 Replies

3. Shell Programming and Scripting

Replacing matched patterns in multiple files with awk

Hello all, I have since given up trying to figure this out and used sed instead, but I am trying to understand awk and was wondering how someone might do this in awk. I am trying to match on the first field of a specific file with the first field on multiple files, and append the second field... (2 Replies)
Discussion started by: karlmalowned
2 Replies

4. Shell Programming and Scripting

Find matched patterns and print them with other patterns not the whole line

Hi, I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Discussion started by: redse171
3 Replies

5. UNIX for Dummies Questions & Answers

How to merge two tables based on a matched column?

Hi, Please excuse me , i have searched unix forum, i am unable to find what i expect , my query is , i have 2 files of same structure and having 1 similar field/column , i need to merge 2 tables/files based on the one matched field/column (that is field 1), file 1:... (5 Replies)
Discussion started by: karthikram
5 Replies

6. Shell Programming and Scripting

Find matched patterns in multiple files

Hi, I need help to find matched patterns in 30 files residing in a folder simultaneously. All these files only contain 1 column. For example, File1 Gr_1 st-e34ss-11dd bt-wwd-fewq pt-wq02-ddpk pw-xsw17-aqpp Gr_2 srq-wy09-yyd9 sqq-fdfs-ffs9 Gr_3 etas-qqa-dfw ddw-ppls-qqw... (10 Replies)
Discussion started by: redse171
10 Replies

7. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

8. Shell Programming and Scripting

How to group matched patterns in different files

Hi, I have a master file that i need to split into multiple files based on matched patterns. sample of my data as follows:- scaff_1 a e 123 130 c_scaff_100 scaff_1 a e 132 138 c_scaff_101 scaff_1 a e 140 150 ... (2 Replies)
Discussion started by: redse171
2 Replies

9. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

10. Shell Programming and Scripting

Column matched in two files

I have two files with multiple columns separated by commas. I want to search one column from the first file in the second file; If there is a matching, will append that matched row to the first file. Let me show an example... (unique values in the search column) First file aa,reg1,bb,cc,dd,ff... (6 Replies)
Discussion started by: lalelle
6 Replies
Login or Register to Ask a Question