Print sequences from file2 based on match to, AND in same order as, file1


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Print sequences from file2 based on match to, AND in same order as, file1
# 1  
Old 03-09-2014
Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1.

file1:
Code:
EN_comp12952_c0_seq3:367-1668
ES_comp17168_c1_seq6:1-864
EN_comp13395_c3_seq14:231-1088
ES_comp17836_c2_seq2:2-862

file2:
Code:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>EN_comp13226_c0_seq8:928-1788
MIRTAYDEVDKKEEVEKINLDQLSQGDIINLLKNFRDLNTDEQD
>EN_comp12741_c2_seq4:2-406
KHQIKQLTVQLPKEGQPDSGLTKDYTSSPLHRFKKPGSKNYQNIYPPSST
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp14617_c0_seq1:111-608
MSCRYVPEANMTACGTDYSTLAWHSRSYVLVYAMFAYYLPLLVIIYAYYFIV
>ES_comp17031_c0_seq3:3-1238
QLLAGVVKRSLVNATMFSIRNIEKLMQLAPKFIPTSSMLNSSTTSIPVSTPI
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT

Desired output (same order as file1):
Code:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY

The code I am currently using gives me the right sequences, but not in the right order.
Code:
awk 'NR==FNR{a[$0]=1;next} {n=0;for(i in a){if($0~i){n=1}}} n {print;getline;print}' file1 file2

Current (wrong order) output:
Code:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT

Thanks for any pointers!
# 2  
Old 03-10-2014
Hello,

Following may help you in same.

Code:
$ awk 'NR==FNR{a[">"$0];} ($1 in a) {print $0;getline;print $0}' file1 file2


Output will be as follows.

Code:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT

Thanks,
R. Singh
# 3  
Old 03-10-2014
Thank you for taking time on this. I actually would like the following output, so that the output sequences are in the same order as file1 (sorry, might have been confusing to also post the undesired outcome):
Code:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY

# 4  
Old 03-10-2014
Code:
awk 'NR==FNR{getline B; A[substr($0,2)]=B; next} $0 in A{ $0=">"$0"\n"A[$0] } 1' file2 file1

# 5  
Old 03-10-2014
Code:
awk 'NR==FNR{if(/^>/){key=$0} else {a[key]=$0};next}
{if (a[">"$0]) { print ">"$0;print a[">"$0]}}' file2 file1

This User Gave Thanks to pravin27 For This Post:
# 6  
Old 03-10-2014
Code:
awk 'NR==FNR{B=$0;getline;A[B]=$0;next} {D=">"$0; print D;print A[D]}' file2 file1

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

3. Shell Programming and Scripting

Match single line in file1 to groups of lines in file2

I have two files. File 1 is a two-column index file, e.g. comp11084_c0_seq6:130-468(-) comp12746_c0_seq3:140-478(+) comp11084_c0_seq3:201-539(-) comp12746_c0_seq2:191-529(+) File 2 is a sequence file with headers named with the same terms that populate file 1. ... (1 Reply)
Discussion started by: pathunkathunk
1 Replies

4. Shell Programming and Scripting

awk read in file1, gsub in file2, print to file3

I'm trying to use awk to do the following. I have file1 with many lines, each containing 5 fields describing an individual set. I have file2 which is a template config file with variable space holders to be replaced by the values in file1. I would like to substitute each set of values in file1 with... (6 Replies)
Discussion started by: msmehaffey
6 Replies

5. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

6. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

7. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

8. Shell Programming and Scripting

Match one column of file1 with that of file2

Hi, I have file1 like this aaa ggg ddd vvv eeeand file2 aaa 2 aaa 443 xxx 76 aaa 34 ggg 33 wee 99 ggg 33 ddd 1 ddd 10 ddd 98 sds 23 (4 Replies)
Discussion started by: polsum
4 Replies

9. UNIX for Advanced & Expert Users

print contents of file2 for matching pattern in file1 - AWK

File1 row is same as column 2 in file 2. Also file 2 will either start with A, B or C. And 3rd column in file 2 is always F2. When column 2 of file 2 matches file1 column, print all those rows into a separate file. Here is an example. file 1: 100 103 104 108 file 2: ... (6 Replies)
Discussion started by: i.scientist
6 Replies

10. Shell Programming and Scripting

match value from file1 in file2

Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2. this is what i have so far. awk 'FILENAME=="file1"{ arr=$1 } FILENAME=="file2" {print $0} ' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
Login or Register to Ask a Question