Append file name to fasta file headers in Linux


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Append file name to fasta file headers in Linux
# 8  
Old 01-30-2014
Is it also possible to modify this code, for example using a for-loop, to perform this action on all .fasta files in a directory (outputing the modified files to a new directory or similarly)?
# 9  
Old 01-30-2014
Code:
for file in /source_dir/*.fasta
do
        fname="${file##*/}"
        awk '/>/{sub(">","&"FILENAME"_");sub(/\.fasta/,x)}1' "$file" > /dest_dir/$fname"
done

# 10  
Old 01-30-2014
Code:
for file in /source_dir/*.fasta
do
        fname="${file##*/}"
        awk '/>/{sub(">","&"FILENAME"_");sub(/\.fasta/,x)}1' "$file" > /dest_dir/$fname"
done

This User Gave Thanks to Yoda For This Post:
# 11  
Old 01-30-2014
OR this

Code:
$ awk '/>/{sub(/\..*/,x,FILENAME);$0=FILENAME"_"$2}1' FS=">" sample_1.fasta 

sample_1_node_1
atgatgatgat
sample_1_node_3
gtcgtcgtcgtcgtc
sample_1_node_12
gtagtagta

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows, gene1.fasta >1256 ATGTAGC >GEP TAGAG >GTY578 ATGCATA >67_iga ATGCTGA >90_ld ATGCTG I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows, list.txt position1=org5... (5 Replies)
Discussion started by: dineshkumarsrk
5 Replies

2. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies

3. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile: >P39PT-1224_Freq_900 cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg >P39PT-784_Freq_2 cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc >P39PT-678_Freq_5... (4 Replies)
Discussion started by: Xterra
4 Replies

4. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

5. UNIX for Advanced & Expert Users

Cannot find logical file format for BSD file headers.

Hi. Unix rookie here. Been looking for a few days for reference documents on how BSD UNIX lays the logical file format onto a disk. Goal is to view/edit with hex editor for data repair. Lots of docs are available for how to use Unix commands (like xxd), but I want to learn the map of how Unix... (4 Replies)
Discussion started by: Chris_top_he_r
4 Replies

6. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

7. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

8. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

9. Shell Programming and Scripting

parse fasta file to tabular file

Hello, A bioperl problem I thought could be done with awk: convert the fasta format (Note: the length of each row is not the same for each entry as they were combined from different files!) to tabular format. input.fasta: >YAL069W-1.334 Putative promoter sequence... (6 Replies)
Discussion started by: yifangt
6 Replies

10. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies
Login or Register to Ask a Question