How to count the length of fasta sequences? Post: 303033657

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that?

2. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

3. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

4. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table...

5. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2...

6. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA...

7. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2...

8. Shell Programming and Scripting

Outputting sequences based on length with sed

I have this file: >ID1 AA >ID2 TTTTTT >ID-3 AAAAAAAAA >ID4 TTTTTTGGAGATCAGTAGCAGATGACAG-GGGGG-TGCACCCC Add I am trying to use this script to output sequences longer than 15 characters: sed -r '/^>/N;{/^.{,15}$/d}' The desire output would be this: >ID4...

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT...

10. UNIX for Beginners Questions & Answers

How to add specific bases at the beginning and ending of all the fasta sequences?

Hi, I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below test.fasta >TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1...

LEARN ABOUT CENTOS

algorithm::diffold

Algorithm::DiffOld(3)					User Contributed Perl Documentation				     Algorithm::DiffOld(3)

NAME

       Algorithm::DiffOld - Compute `intelligent' differences between two files / lists but use the old (<=0.59) interface.

NOTE

       This has been provided as part of the Algorithm::Diff package by Ned Konz.  This particular module is ONLY for people who HAVE to have the
       old interface, which uses a comparison function rather than a key generating function.

       Because each of the lines in one array have to be compared with each of the lines in the other array, this does M*N comparisions. This can
       be very slow. I clocked it at taking 18 times as long as the stock version of Algorithm::Diff for a 4000-line file. It will get worse
       quadratically as array sizes increase.

SYNOPSIS

	 use Algorithm::DiffOld qw(diff LCS traverse_sequences);

	 @lcs	 = LCS( @seq1, @seq2, $comparison_function );

	 $lcsref = LCS( @seq1, @seq2, $comparison_function );

	 @diffs = diff( @seq1, @seq2, $comparison_function );

	 traverse_sequences( @seq1, @seq2,
			    { MATCH => $callback,
			      DISCARD_A => $callback,
			      DISCARD_B => $callback,
			    },
			    $comparison_function );

COMPARISON FUNCTIONS

       Each of the main routines should be passed a comparison function. If you aren't passing one in, use Algorithm::Diff instead.

       These functions should return a true value when two items should compare as equal.

       For instance,

	 @lcs	 = LCS( @seq1, @seq2, sub { my ($a, $b) = @_; $a eq $b } );

       but if that is all you're doing with your comparison function, just use Algorithm::Diff and let it do this (this is its default).

       Or:

	 sub someFunkyComparisonFunction
	 {
	       my ($a, $b) = @_;
	       $a =~ m{$b};
	 }

	 @diffs = diff( @lines, @patterns, &someFunkyComparisonFunction );

       which would allow you to diff an array @lines which consists of text lines with an array @patterns which consists of regular expressions.

       This is actually the reason I wrote this version -- there is no way to do this with a key generation function as in the stock
       Algorithm::Diff.

perl v5.16.3							    2006-07-31						     Algorithm::DiffOld(3)