File merging based on column patterns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers File merging based on column patterns
# 1  
Old 10-27-2015
File merging based on column patterns

Hello Smilie

I am in this situation:

Input: two tab-delimited files, `File1` and `File2`. `File2` (`$2`) has to be parsed by patterns found in `File1` (`$1`).

Expected output: tab-delimited file, `File3`. `File3` has to contain the same rows as `File2`, plus the corresponding value in `File1` if the pattern is matched (end of the line, tab-separated).

File1 (tab-delimited) :

Code:
 
    ABC1    1    3    4
    ABC2    4    3    3
    ABC3    3    2    3
    ABC4    3    3    3

File2 (tab-delimited):
Code:
    text1   ABC1AB   text2   text3
    text2   ABC2AB   text1   
    text3   ABC1CDE   text2
    text4   ABC5AB   text3   text4

File3 (desired output, tab-delimited):

Code:
    text1   ABC1AB   text2   text3    1    3    4
    text2   ABC2AB   text1    4    3    3
    text3   ABC1CDE   text2    1    3    4
    text4   ABC5AB   text3   text4

I know this should be quite an easy task, could be done by awk/grep, but I didn't succeed to get it work :/

Any help is welcome, thanks in advance
# 2  
Old 10-27-2015
Please post your attempt(s) so we can analyse and/or improve in case.
# 3  
Old 10-27-2015
here's my (failed) attempt
Code:
$ awk -F"\t" 'FNR==NR{a[$1]=$2"\t"$3"\t"$4} FNR!=NR{$0=$0"\t"a[$3];print}' file1 file2 > file3

# 4  
Old 10-27-2015
Well, try
Code:
awk '
FNR==NR         {X=$1
                 sub ($1 FS, "")
                 T[X]=$0
                 next
                }
                {for (t in T) if ($2 ~ t) $(NF+1)=T[t]}
1
' FS="\t" OFS="\t" file1 file2
text1   ABC1AB  text2   text3   1       3       4
text2   ABC2AB  text1           4       3       3
text3   ABC1CDE text2   1       3       4
text4   ABC5AB  text3   text4

This User Gave Thanks to RudiC For This Post:
# 5  
Old 10-27-2015
Thank you so much, it works now Smilie Awesome!
# 6  
Old 10-27-2015
If the pattern is always 4 characters then you can replace the loop
Code:
 {for (t in T) if ($2 ~ t) $(NF+1)=T[t]}
1

by the efficient
Code:
 {print $0, T[substr($2,1,4)]}


Last edited by MadeInGermany; 10-27-2015 at 05:52 PM.. Reason: added a comma
This User Gave Thanks to MadeInGermany For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merging rows based on same ID in First column.

Hellow, I have a tab-delimited file with 3 columns : BINPACKER.13259.1.p2 SSF48239 BINPACKER.13259.1.p2 PF13243 BINPACKER.13259.1.p2 G3DSA:1.50.10.20 BINPACKER.13259.2.p2 SSF48239 BINPACKER.13259.2.p2 PF13243 BINPACKER.13259.2.p2 G3DSA:1.50.10.20... (7 Replies)
Discussion started by: anjaliANJALI
7 Replies

2. UNIX for Beginners Questions & Answers

Merging multiple lines into single line based on one column

I Want to merge multiple lines based on the 1st field and keep into single record. SRC File: AAA_POC_DB.TAB1 AAA_POC_DB.TAB2 AAA_POC_DB.TAB3 AAA_POC_DB.TAB4 BBB_POC_DB.TAB1 BBB_POC_DB.TAB2 CCC_POC_DB.TAB6 OUTPUT ----------------- 'AAA_POC_DB','TAB1','TAB2','TAB3','TAB4'... (10 Replies)
Discussion started by: raju2016
10 Replies

3. UNIX for Dummies Questions & Answers

Split 1 column into numerous columns based on patterns

Hi, I have a text file 'Item_List.txt' containing only 1 column. This column lists different products, each separated by the same generic string header "NEW PRODUCT, VERSION 1.1". After this the name of the product is given, then a delimiter string "PRODUCT FIELD", and then the name of the... (11 Replies)
Discussion started by: mmab
11 Replies

4. UNIX for Dummies Questions & Answers

Merging lines based on one column

Hi, I have a file which I'd like to merge lines based on duplicates in one column while keeping the info for other columns. Let me simplify it by an example: File ESR1 ANASTROZOLE NA FDA_approved ESR1 CISPLATIN NA FDA_approved ESR1 DANAZOL agonist NA ESR1 EXEMESTANE NA FDA_approved... (3 Replies)
Discussion started by: JJ001
3 Replies

5. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

6. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

7. Shell Programming and Scripting

File merging based on different counter loop

hello, File 1 main Group sub group MIT VAR_1D_DATA_TYPE 23-03-2012 MIT VAR_1D_DATA_TYPE 22-03-2012 MIT VAR_10D_DATA_TYPE 23-03-2012 MIT VAR_10D_DATA_TYPE 22-03-2012 MIT ... (0 Replies)
Discussion started by: manas_ranjan
0 Replies

8. Shell Programming and Scripting

merging two files based on first column

I had two files file1 and file2. I want a o/p file(file3) like below using first column as ref. Pls give suggestion ass join is not working as the number of lines in each file is nealry 5 C? file1 --------------------- 404000324810001 Y 404000324810004 N 404000324810008 Y 404000324810009 N... (1 Reply)
Discussion started by: p_sai_ias
1 Replies

9. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

10. Shell Programming and Scripting

merging column from two files based on identifier

Hi, I have two files consisting of two columns. So I want to merge column 2 if column 1 is the same. So heres an example of what I mean. FILE1 driver 444 car 333 hat 222 FILE2 driver 333 car 666 hat 999 So I want to merge the column 2's together so... (4 Replies)
Discussion started by: phil_heath
4 Replies
Login or Register to Ask a Question