Command Line Perl for parsing fasta file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Command Line Perl for parsing fasta file
# 1  
Old 06-16-2015
Command Line Perl for parsing fasta file

I would like to take a fasta file formated like
Code:
>0001
agttcgaggtcagaatt
>0002
agttcgag
>0003
ggtaacctga

and use command line perl to move the all sample gt 8 in length to a new file. the result would be
Code:
>0001
agttcgaggtcagaatt
>0003
ggtaacctga

Code:
cat ${sample}.fasta | perl -lane 'while(<>){if /^>/}'

????? How can I achieve this?

Last edited by Scrutinizer; 06-16-2015 at 03:57 PM.. Reason: additional code tags
# 2  
Old 06-16-2015
Does it have to be perl?
Code:
awk 'length($2)>8{print RS $0}' RS=\> ORS= "${sample}.fasta"

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 06-16-2015
Code:
perl -ne 'length > 9 && $last && print "$last$_"; $last = $_'  ${sample}.fasta > result.fasta

Explanation:

length > 9: only if the length of the line is more than 9 (to accommodate the newline character as well)
&& $last && print "$last$_";: ...and if we have seen a line before, display that last line (<0000x\n) and the current line.
$last = $_: keep track of the last line read.

Last edited by Aia; 06-16-2015 at 04:59 PM.. Reason: adding a bit of explanation
This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

2. Shell Programming and Scripting

Perl command line option '-n','-p' and multiple files: can it know a file name of a printed line?

I am looking for help in processing of those options: '-n' or '-p' I understand what they do and how to use them. But, I would like to use them with more than one file (and without any shell-loop; loading the 'perl' once.) I did try it and -n works on 2 files. Question is: - is it possible to... (6 Replies)
Discussion started by: alex_5161
6 Replies

3. Shell Programming and Scripting

Modifying file from command line using Perl

Hi all, I am having a slight issue updating a file using perl from the command line I need some help with. The item is: DATA_FILE_TYPE=FULL When I run the below command /usr/bin/perl -p -i -e "s/DATA_FILE_TYPE=/DATA_FILE_TYPE=APPEND/g" processfile.cfg It looks to be... (2 Replies)
Discussion started by: kstevens67
2 Replies

4. Shell Programming and Scripting

Parsing and masking regions from a single fasta file with subsequence

HI, I have a Complete genome fasta file and I have list of sub sequence regions in the format as : 4353..5633 6795..9354 1034..14456 I want a script which can mask these region in a single complete genome fasta file with the alphabet N kindly help (2 Replies)
Discussion started by: margarita
2 Replies

5. Shell Programming and Scripting

Can't Output Piped Perl In-line command to a File

Hello, I'm pretty stumped, and I don't know why I am not able to redirect the output to the 'graphme' file with the command below in Fedora 18. tcpdump -l -n -t "tcp == 18" | perl -ane '($s,$j)=split(/,/,$F,2); print "$s\n";' > graphme In case you're wondering, I was following the example... (2 Replies)
Discussion started by: ConcealedKnight
2 Replies

6. UNIX for Dummies Questions & Answers

Find & Replace command - Fasta file

Hi all ! I have a fasta file that looks like that: >Sequence1 RTYIPLCASQHKLCPITFLAVK (it's just an example, obviously in reality I have several pairs of lines like that) Using UNIX command(s), would it be possible to replace all the characters except the "C" of the second line only by... (7 Replies)
Discussion started by: Cevin21
7 Replies

7. Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this? For Example Chr 1 is in following format I need regions from 2 - 10 should give me AATTCCAAA and in a similar way 15- 25 should give... (8 Replies)
Discussion started by: empyrean
8 Replies

8. Shell Programming and Scripting

Perl question, parsing line by line

Hello, Im very new to PERL and as a project to work on developing my skills at PERL Im trying to parse poker hands. Ive tried many methods however I cant get the last step. $yourfile= 'FILENAME';#poker hands to parse open (FILE, "$yourfile") or die $!; @lines = <FILE>; for (@lines) ... (1 Reply)
Discussion started by: Ek0
1 Replies

9. Shell Programming and Scripting

parsing command line switches in Perl

Hi, My perl script takes few switches which i'm parsing through GetOpt::Long module. My script looks like something : myscript.pl --file="foo" --or --file="bar" The --file switch takes 2 arguments foo and bar. The 2 values of file are separated by --or switch. I want to ensure that... (1 Reply)
Discussion started by: obelix
1 Replies

10. UNIX for Dummies Questions & Answers

command line argument parsing

how to parse the command line argument to look for '@' sign and the following with '.'. In my shell script one of the argument passed is email address. I want to parse this email address to look for correct format. rmjoe123@hotmail.com has '@' sign and followed by a '.' to be more... (1 Reply)
Discussion started by: rmjoe
1 Replies
Login or Register to Ask a Question