Find matched patterns in a column of 2 files with different size and merge them


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find matched patterns in a column of 2 files with different size and merge them
# 1  
Old 05-28-2014
Find matched patterns in a column of 2 files with different size and merge them

Hi,

i have input files like below:-

input1
Code:
Name	Seq_ID	NewID	Scores
MT1	A0QZX3	1.65	  277.4
IVO	A0QZX3	1.65	  244.5
HPO	A0QZX3	1.65	  240.5
RgP	A0Q3PP	5.32	  241.0
GX1	LPSZ3S	96.1	  216.9
MEL	LPSS3X	4.23	  204.1
LDD     LPSS3X  4.23      100.2

input2
Code:
Fac	AddName  NewID	    Details	Type	 Cond
1	Garp	  1.65      navy1	 grt     alive
1	Garp	  1.67      admiral      ggt     alive
4	Luffy	  4.23      Pirate1      hr      survive
4       Luffy     4.25      Pirate2      hhhh    survive
9	Ace	  96.1      Brother	 hdsm    dead

I want to match NewID ($3) in both files and merge them (append input1) as follows:-

Code:
Name	Seq_ID	NewID	 Scores  Fac	AddName     Details	Type	 Cond
MT1	A0QZX3	1.65	  277.4  1	Garp	    navy1	 grt     alive
IVO	A0QZX3	1.65	  244.5  1	Garp	    navy1	 grt     alive
HPO	A0QZX3	1.65	  240.5  1	Garp	    navy1	 grt     alive
GX1	LPSZ3S	96.1	  216.9  9	Ace	    Brother	 hdsm    dead
MEL	LPSS3X	4.23	  204.1  4	Luffy	    Pirate1      hr      survive
LDD     LPSS3X  4.23      100.2  4	Luffy	    Pirate1      hr      survive

I have thousands of records like this. and i tried the following codes that i found in this forum:-

Code:
awk 'NR==FNR{A[$3]=$0; next} $3=A[$3]' input1 FS='\t' OFS='\t' input2

Unfortunately, it does not give me the results that i want where it ignores the duplicate values in input2. for instance, for the above sample, the result that i got using this code is

Code:
Name	Seq_ID	NewID	 Scores  Fac	AddName     Details	Type	 Cond
HPO	A0QZX3	1.65	  240.5  1	Garp	    navy1	 grt     alive
GX1	LPSZ3S	96.1	  216.9  9	Ace	    Brother	 hdsm    dead
LDD     LPSS3X  4.23      100.2  4	Luffy	    Pirate1      hr      survive

Please help me to solve this. Thanks
# 2  
Old 05-28-2014
Read file: input2 first:
Code:
awk '
        NR == FNR {
                A[$3] = $1 OFS $2 OFS $4 OFS $5 OFS $6
                next
        }
        $3 in A {
                print $1, $2, $3, $4, A[$3]
        }
' OFS='\t' input2 input1

This User Gave Thanks to Yoda For This Post:
# 3  
Old 05-28-2014
Code:
$ awk 'FNR==NR{s=$3; $3=""; A[s]=$0;next}($3 in A){print $0,A[$3]}' OFS="\t" input2 input1

This User Gave Thanks to Akshay Hegde For This Post:
# 4  
Old 05-28-2014
Hi Yoda,

I tried your code and it works great on my sample data above. However, it did not work for my real data and i realize that it is because there are values in $1 (Fac) in file 2 that quite similar to $3 (NewID). for example:

Code:
Fac	AddName  NewID	    Details	Type	 Cond
1.21	Shank	  1.21.1      Hero1	 KT     alive
1.28	Shank	  1.28.6      warlord     PP     alive

I am trying to fix this.

---------- Post updated at 01:59 PM ---------- Previous update was at 01:57 PM ----------

Hi Akshay Hegde,

Tried your code but it did not give me anything. Thanks
# 5  
Old 05-28-2014
With sample input it worked for me..

Code:
$ cat input2
Fac	AddName  NewID	    Details	Type	 Cond
1	Garp	  1.65      navy1	 grt     alive
1	Garp	  1.67      admiral      ggt     alive
4	Luffy	  4.23      Pirate1      hr      survive
4       Luffy     4.25      Pirate2      hhhh    survive
9	Ace	  96.1      Brother	 hdsm    dead

Code:
$ cat input1
Name	Seq_ID	NewID	Scores
MT1	A0QZX3	1.65	  277.4
IVO	A0QZX3	1.65	  244.5
HPO	A0QZX3	1.65	  240.5
RgP	A0Q3PP	5.32	  241.0
GX1	LPSZ3S	96.1	  216.9
MEL	LPSS3X	4.23	  204.1
LDD     LPSS3X  4.23      100.2

Code:
$ awk 'FNR==NR{s=$3; $3=""; A[s]=$0;next}($3 in A){print $0,A[$3]}' OFS="\t" input2 input1

Name	Seq_ID	NewID	Scores	Fac	AddName		Details	Type	Cond
MT1	A0QZX3	1.65	  277.4	1	Garp		navy1	grt	alive
IVO	A0QZX3	1.65	  244.5	1	Garp		navy1	grt	alive
HPO	A0QZX3	1.65	  240.5	1	Garp		navy1	grt	alive
GX1	LPSZ3S	96.1	  216.9	9	Ace		Brother	hdsm	dead
MEL	LPSS3X	4.23	  204.1	4	Luffy		Pirate1	hr	survive
LDD     LPSS3X  4.23      100.2	4	Luffy		Pirate1	hr	survive

This User Gave Thanks to Akshay Hegde For This Post:
# 6  
Old 05-28-2014
Quote:
Originally Posted by Akshay Hegde
With sample input it worked for me..

Code:
$ cat input2
Fac	AddName  NewID	    Details	Type	 Cond
1	Garp	  1.65      navy1	 grt     alive
1	Garp	  1.67      admiral      ggt     alive
4	Luffy	  4.23      Pirate1      hr      survive
4       Luffy     4.25      Pirate2      hhhh    survive
9	Ace	  96.1      Brother	 hdsm    dead

Code:
$ cat input1
Name	Seq_ID	NewID	Scores
MT1	A0QZX3	1.65	  277.4
IVO	A0QZX3	1.65	  244.5
HPO	A0QZX3	1.65	  240.5
RgP	A0Q3PP	5.32	  241.0
GX1	LPSZ3S	96.1	  216.9
MEL	LPSS3X	4.23	  204.1
LDD     LPSS3X  4.23      100.2

Code:
$ awk 'FNR==NR{s=$3; $3=""; A[s]=$0;next}($3 in A){print $0,A[$3]}' OFS="\t" input2 input1

Name	Seq_ID	NewID	Scores	Fac	AddName		Details	Type	Cond
MT1	A0QZX3	1.65	  277.4	1	Garp		navy1	grt	alive
IVO	A0QZX3	1.65	  244.5	1	Garp		navy1	grt	alive
HPO	A0QZX3	1.65	  240.5	1	Garp		navy1	grt	alive
GX1	LPSZ3S	96.1	  216.9	9	Ace		Brother	hdsm	dead
MEL	LPSS3X	4.23	  204.1	4	Luffy		Pirate1	hr	survive
LDD     LPSS3X  4.23      100.2	4	Luffy		Pirate1	hr	survive


Sorry, it did work using sample data but not the real ones due to the issue that i mentioned above. Thanks
# 7  
Old 05-28-2014
I think its good if you can show few line of your real data... will see what went wrong...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

2. Shell Programming and Scripting

Find all matched lines from two files

Hello, everyone I have two files like this: File 1: A B C D E FFile 2:A B 1 A C 2 A K 3 B A 4 D E 3 W X 2 A B 2I want to print all lines (file2) that the first two columns are consist of elements from file1. So, what I expected is : A B 1 A B 2 (2 Replies)
Discussion started by: nengcheng
2 Replies

3. Shell Programming and Scripting

Replacing matched patterns in multiple files with awk

Hello all, I have since given up trying to figure this out and used sed instead, but I am trying to understand awk and was wondering how someone might do this in awk. I am trying to match on the first field of a specific file with the first field on multiple files, and append the second field... (2 Replies)
Discussion started by: karlmalowned
2 Replies

4. Shell Programming and Scripting

Find matched patterns and print them with other patterns not the whole line

Hi, I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Discussion started by: redse171
3 Replies

5. UNIX for Dummies Questions & Answers

How to merge two tables based on a matched column?

Hi, Please excuse me , i have searched unix forum, i am unable to find what i expect , my query is , i have 2 files of same structure and having 1 similar field/column , i need to merge 2 tables/files based on the one matched field/column (that is field 1), file 1:... (5 Replies)
Discussion started by: karthikram
5 Replies

6. Shell Programming and Scripting

Find matched patterns in multiple files

Hi, I need help to find matched patterns in 30 files residing in a folder simultaneously. All these files only contain 1 column. For example, File1 Gr_1 st-e34ss-11dd bt-wwd-fewq pt-wq02-ddpk pw-xsw17-aqpp Gr_2 srq-wy09-yyd9 sqq-fdfs-ffs9 Gr_3 etas-qqa-dfw ddw-ppls-qqw... (10 Replies)
Discussion started by: redse171
10 Replies

7. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

8. Shell Programming and Scripting

How to group matched patterns in different files

Hi, I have a master file that i need to split into multiple files based on matched patterns. sample of my data as follows:- scaff_1 a e 123 130 c_scaff_100 scaff_1 a e 132 138 c_scaff_101 scaff_1 a e 140 150 ... (2 Replies)
Discussion started by: redse171
2 Replies

9. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

10. Shell Programming and Scripting

Column matched in two files

I have two files with multiple columns separated by commas. I want to search one column from the first file in the second file; If there is a matching, will append that matched row to the first file. Let me show an example... (unique values in the search column) First file aa,reg1,bb,cc,dd,ff... (6 Replies)
Discussion started by: lalelle
6 Replies
Login or Register to Ask a Question