Sponsored Content
Top Forums Shell Programming and Scripting parse fasta file to tabular file Post 302584712 by yifangt on Saturday 24th of December 2011 06:22:17 PM
Old 12-24-2011
Thanks Kato!
Your second version is much better. Is it possible to remove the tabs within the sequence fields? i.e. merge the sequence to a single field instead of being separated with the tab. gsub the first "\n" with "\t", but gsub the second "\n" and after with nothing. One step from what I want.
Merry Christmas!!!

Last edited by yifangt; 12-24-2011 at 07:39 PM.. Reason: improve the algorithm
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

2. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

I have fasta files with multiple sequences in each. I need to change the sequence name headers from: >accD:_59176-60699 ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA >atpA_(reverse_strand):_showing_revcomp_of_10525-12048 ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
Discussion started by: tyrianthinae
2 Replies

3. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

4. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

5. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

How do we append the file name to fasta file headers in multiple fasta-files in Linux? (10 Replies)
Discussion started by: Mauve
10 Replies

6. Shell Programming and Scripting

Convert text file to HTML tabular format.

Please provide script/commands to convert text file to HTML tabular format. No need of styles and colours, just output and a heading in table is required. Output file will be send via email and will be seen from outlook. (script required without using awk). output file content: (sar... (7 Replies)
Discussion started by: Veera_V
7 Replies

7. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Discussion started by: Marion MPI
6 Replies

8. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

9. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile: >P39PT-1224_Freq_900 cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg >P39PT-784_Freq_2 cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc >P39PT-678_Freq_5... (4 Replies)
Discussion started by: Xterra
4 Replies

10. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2... (3 Replies)
Discussion started by: Ibk
3 Replies
tartdates(5)						      LinuxTaRT Special Dates						      tartdates(5)

NAME
tartdates - Special date configuration for LinuxTaRT DESCRIPTION
LinuxTaRT uses a configuration file called specialdates to enable the user to display a specific tagline on certain special dates. The name and location of this file is specified in the LinuxTaRT configuration file ~/.tartrc, but is usually: ~/.tartdates PARAMETERS
Each line of the special date file consists of a day of the year expressed as a two digit month, a forward-slash (/), a two digit date, a colon (:) and a line of text. mm/dd:tagline text Any line that does not conform to this specification will be treated as a comment. Comments may also be specified by prefixing text with a # symbol. Using this method, you can place a comment behind a date line. EXAMPLES
01/01:Happy New Year!! Will use "Happy New Year!!" as the tagline when the date is January 1st. 12/25:Merry Christmas to all! Displays "Merry Christmas to all!" as your tagline on Christmas day. FILES
~/.tartdates SEE ALSO
tart(1) tartrc(5) tart-custom(5) Mark Veinot 1.0.0 tartdates(5)
All times are GMT -4. The time now is 12:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy