Is it possible to rename fasta headers based on its position specified in another file?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Is it possible to rename fasta headers based on its position specified in another file?
# 1  
Old 11-13-2019
Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows,
Code:
gene1.fasta
>1256
ATGTAGC
>GEP
TAGAG
>GTY578
ATGCATA
>67_iga
ATGCTGA
>90_ld
ATGCTG

I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows,
Code:
list.txt
position1=org5
position2=amylase
position3=org8
position4=lipase
position5=org_1

The expected outcome should be like this,
Code:
>org5
ATGTAGC
>amylase
TAGAG
>org8
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG

Thanks in advance.

Last edited by dineshkumarsrk; 11-13-2019 at 03:07 AM..
# 2  
Old 11-13-2019
Any attempts / ideas / thoughts from your side? Applying what you learned in here?
# 3  
Old 11-13-2019
Dear Rudic,
Below script can rename the characters in gene1.fasta specified in list.txt.
Code:
awk 'FNR==NR{REP[$1]=$2; next} {for (r in REP) gsub(r, REP[r])}1' FS="=" list.txt gene1.fasta

However, it is not based on the position. Its purely based on the matching strings between the two files. But, here my problem is different, I tried workout like this
Code:
++i

, but my list.txt is not having common strings, so I can not rename sequentially. That is why I seek your help.

Last edited by dineshkumarsrk; 11-13-2019 at 04:12 AM..
# 4  
Old 11-13-2019
For the easy case that your replacements are in lines in increasing order, try
Code:
awk 'FNR==NR {REP[NR] = $2; next} /^>/ {$0 = ">" REP[++CNT]}1' FS="=" list.txt gene1.fasta
>org5
ATGTAGC
>amylase
TAGAG
>org8
ATGCATA
>lipase
ATGCTGA
>org_1
 ATGCTG

EDIT: in case it's not (here: position3 doesn't exist), try



Code:
awk 'FNR==NR {REP[$1] = $2; next} /^>/ && (TMP = "position" ++CNT) in REP {$0 = ">" REP[TMP]}1' FS="=" list.txt gene1.fasta
>org5
ATGTAGC
>amylase
TAGAG
>GTY578
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG


Last edited by RudiC; 11-13-2019 at 05:10 AM..
This User Gave Thanks to RudiC For This Post:
# 5  
Old 11-13-2019
@RudiC,
Both serve my purpose perfectly.
# 6  
Old 11-13-2019
Also try:
Code:
awk -F= '/^>/{if(getline<f>0) $0=">" $2}1' f=list.txt gene1.fasta

or without the check:
Code:
awk -F= '/^>/{getline<f; $0=">" $2}1' f=list.txt gene1.fasta


Last edited by Scrutinizer; 11-14-2019 at 05:31 PM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a string at a particular position and replace with blank based on position

Hi, I have a file with multiple lines(fixed width dat file). I want to search for '02' in the positions 45-46 and if available, in that lines, I need to replace value in position 359 with blank. As I am new to unix, I am not able to figure out how to do this. Can you please help me to achieve... (9 Replies)
Discussion started by: Pradhikshan
9 Replies

2. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

How do we append the file name to fasta file headers in multiple fasta-files in Linux? (10 Replies)
Discussion started by: Mauve
10 Replies

3. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

4. Shell Programming and Scripting

Split file based on distinct value at specific position

OS : Linux 2.6x Shell : Korn In a single file , how can I identify all the Uniqe values at a specific character position and length of each record , and simultaneously SPLIT the records of the file based on each of these values and write them in seperate files . Lets say : a) I want to... (4 Replies)
Discussion started by: kumarjt
4 Replies

5. Shell Programming and Scripting

Fixed width file search based on position value

Hi, I am unable to find the right option to extract the data in the fixed width file. sample data abcd1234xgyhsyshijfkfk hujk9876 io xgla loki8787eljuwoejroiweo dkfj9098 dja Search based on position 8-9="xg" and print the entire row output ... (4 Replies)
Discussion started by: onesuri
4 Replies

6. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

7. Shell Programming and Scripting

Replacing headers based on a second file

I have a file with thousands of sequences that looks like this: I need to replace the headers using a second file Thus, I will end up having the following file: I am looking for an AWK script that I can easily plug in my current pipeline. Any help will be greatly appreciated! (6 Replies)
Discussion started by: Xterra
6 Replies

8. UNIX for Dummies Questions & Answers

Script to delete a word based on position in a file

Hi, I am new to unix. I want to delete 2 words placed at position say for example at 23rd and 45th position in a line. I used sed but couldnt achieve this. Example: the file contains 2 lines 12345 98765 "12345" 876 12345 98765 "64578" 876 I want to delete " placed at position 13 and 19... (4 Replies)
Discussion started by: nbks2u
4 Replies

9. UNIX for Dummies Questions & Answers

To Extract words from File based on Position

Hi Guys, While I was writing one shell script , I just got struck at this point. I need to extract words from a file at some specified position and do some comparison operation and need to replace the extracted word with another word. Eg : I like Orange very much. I need to replace... (19 Replies)
Discussion started by: kuttu123
19 Replies

10. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies
Login or Register to Ask a Question