Sponsored Content
Top Forums Shell Programming and Scripting Join lines from two files based on match Post 302844326 by pathunkathunk on Sunday 18th of August 2013 01:46:36 PM
Old 08-18-2013
Join lines from two files based on match

I have two files.
File1
Code:
>gi|11320906|gb|AF197889.1|_Buchnera_aphidicola
ATGAAATTTAAGATAAAAAATAGTATTTT
>gi|11320898|gb|AF197885.1|_Buchnera_aphidicola
ATGAAATTTAATATAAACAATAAAA
>gi|11320894|gb|AF197883.1|_Buchnera_aphidicola
ATGAAATTTAATATAAACAATAAAATTTTT

File2
Code:
AF197885	Uroleucon aeneum
AF197886	Uroleucon jaceae
AF197889	Uroleucon obscurum
AF197883	Uroleucon astronomus
AF197893	Uroleucon erigeronense

For all lines in file1, I want to match the term bracked by "gb|" and "." (i.e. AF197889 in the first line) to a line in file2. In this example of file1, all terms of interest start with "AF" but this isn't always the case.

If there's a match, I'd like to append the species name in file2, preceded by "_host_" to the matching line in file1, using underscores and no spaces. Desired output:
Code:
>gi|11320906|gb|AF197889.1|_Buchnera_aphidicola_host_Uroleucon_obscurum
ATGAAATTTAAGATAAAAAATAGTATTTT
>gi|11320898|gb|AF197885.1|_Buchnera_aphidicola_host_Uroleucon_aeneum
ATGAAATTTAATATAAACAATAAAA
>gi|11320894|gb|AF197883.1|_Buchnera_aphidicola_host_Uroleucon_astronomus
ATGAAATTTAATATAAACAATAAAATTTTT

With the meager skills I have, I could use "|" as a filed separator for file 1 and use awk to fill an array to find matches. But I'm not sure how to to append the file2 data, or how to accomplish it in one step. Can anyone help?

Last edited by Don Cragun; 08-18-2013 at 02:56 PM.. Reason: CODE tags; not QUOTE tags for input, output, and code samples.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

join based on line number when one file is missing lines

I have a file that contains 87 lines, each with a set of coordinates (x & y). This file looks like: 1 200.3 -0.3 2 201.7 -0.32 ... 87 200.2 -0.314 I have another file which contains data that was taken at certain of these 87 positions. i.e.: 37 125 42 175 86 142 where the first... (1 Reply)
Discussion started by: jackiev
1 Replies

2. Shell Programming and Scripting

join two files based on one column

Hi All, I am trying to join to files based on one common column. Cat File1 ID HID Ab_1 23 Cd 45 df 22 Vv 33 Cat File2 ID pval Ab_1 0.3 Cd 10 Vv 0.0444 (3 Replies)
Discussion started by: newpro
3 Replies

3. UNIX for Dummies Questions & Answers

sed, join lines that do not match pattern

Hello, Could someone help me with sed. I have searched for solution 5 days allready :wall:, but cant find. Unfortunately my "sed" knowledge not good enough to manage it. I have the text: 123, foo1, bar1, short text1, dat1e, stable_pattern 124, foo2, bar2, long text with few lines, date,... (4 Replies)
Discussion started by: petrasl
4 Replies

4. UNIX for Dummies Questions & Answers

join 2 lines based on 1st field

hi i have a file with the following lines 2303:13593:137135 16 abc1 26213806....... 1234:45675:123456 16 bbc1 9813806....... 2303:13593:137135 17 bna1 26566444.... 1234:45675:123456 18 nnb1 98123456....... i want to join the lines having common 1st field i,e., ... (1 Reply)
Discussion started by: anurupa777
1 Replies

5. UNIX for Dummies Questions & Answers

Join the lines until next pattern match

Hi, I have a data file where data is splitted into multiple lines. And, each valid record starts with a patten date | <?xml and ends with pattern </dmm> e.g. 20120924|<?xml record 1 line1....record 1 line1....record 1 line1.... record 1 line2....record 1 line2....record 1 line2.... record 1... (3 Replies)
Discussion started by: Dipalik
3 Replies

6. UNIX for Dummies Questions & Answers

Join 2 files based on certain column

I have file input1.txt 11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14 11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14 11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14 11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14... (5 Replies)
Discussion started by: radius
5 Replies

7. Shell Programming and Scripting

Merge lines based on match

I am trying to merge two lines to one based on some matching condition. The file is as follows: Matches filter: 'request ', timestamp, <HTTPFlow request=<GET: Matches filter: 'request ', timestamp, <HTTPFlow request=<GET: Matches filter: 'request ', timestamp, <HTTPFlow ... (8 Replies)
Discussion started by: jamie_123
8 Replies

8. Shell Programming and Scripting

awk join lines based on keyword

Hello , I will need your help once again. I have the following file: cat file02.txt PATTERN XXX.YYY.ZZZ. 500 ROW01 aaa. 300 XS 14 ROW 45 29 AS XD.FD. PATTERN 500 ZZYN002 ROW gdf gsste ALT 267 fhhfe.ddgdg. PATTERN ERE.MAY. 280 PATTERRNTH 5000 rt.rt. ROW SO a 678 PATTERN... (2 Replies)
Discussion started by: alex2005
2 Replies

9. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

10. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies
DIFF3(1)						      General Commands Manual							  DIFF3(1)

NAME
diff3 - 3-way differential file comparison SYNOPSIS
diff3 [ -exEX3 ] file1 file2 file3 DESCRIPTION
Diff3 compares three versions of a file, and publishes disagreeing ranges of text flagged with these codes: ==== all three files differ ====1 file1 is different ====2 file2 is different ====3 file3 is different The type of change suffered in converting a given range of a given file to some other is indicated in one of these ways: f : n1 a Text is to be appended after line number n1 in file f, where f = 1, 2, or 3. f : n1 , n2 c Text is to be changed in the range line n1 to line n2. If n1 = n2, the range may be abbreviated to n1. The original contents of the range follows immediately after a c indication. When the contents of two files are identical, the contents of the lower-numbered file is suppressed. Under the -e option, diff3 publishes a script for the editor ed that will incorporate into file1 all changes between file2 and file3, i.e. the changes that normally would be flagged ==== and ====3. Option -x (-3) produces a script to incorporate only changes flagged ==== (====3). The following command will apply the resulting script to `file1'. (cat script; echo '1,$p') | ed - file1 The -E and -X are similar to -e and -x, respectively, but treat overlapping changes (i.e., changes that would be flagged with ==== in the normal listing) differently. The overlapping lines from both files will be inserted by the edit script, bracketed by "<<<<<<" and ">>>>>>" lines. For example, suppose lines 7-8 are changed in both file1 and file2. Applying the edit script generated by the command "diff3 -E file1 file2 file3" to file1 results in the file: lines 1-6 of file1 <<<<<<< file1 lines 7-8 of file1 ======= lines 7-8 of file3 >>>>>>> file3 rest of file1 The -E option is used by RCS merge(1) to insure that overlapping changes in the merged files are preserved and brought to someone's atten- tion. FILES
/tmp/d3????? /usr/libexec/diff3 SEE ALSO
diff(1) BUGS
Text lines that consist of a single `.' will defeat -e. 7th Edition October 21, 1996 DIFF3(1)
All times are GMT -4. The time now is 02:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy