How to count the length of fasta sequences? Post: 303033673

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that?

2. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

3. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

4. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table...

5. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2...

6. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA...

7. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2...

8. Shell Programming and Scripting

Outputting sequences based on length with sed

I have this file: >ID1 AA >ID2 TTTTTT >ID-3 AAAAAAAAA >ID4 TTTTTTGGAGATCAGTAGCAGATGACAG-GGGGG-TGCACCCC Add I am trying to use this script to output sequences longer than 15 characters: sed -r '/^>/N;{/^.{,15}$/d}' The desire output would be this: >ID4...

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT...

10. UNIX for Beginners Questions & Answers

How to add specific bases at the beginning and ending of all the fasta sequences?

Hi, I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below test.fasta >TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1...

LEARN ABOUT DEBIAN

kalign

KALIGN(1)							Kalign User Manual							 KALIGN(1)

NAME

       kalign - performs multiple alignment of biological sequences.

SYNOPSIS

       kalign [infile.fasta] [outfile.fasta] [Options]

       kalign [-i infile.fasta] [-o outfile.fasta] [Options]

       kalign [< infile.fasta] [> outfile.fasta] [Options]

DESCRIPTION

       Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm,
       to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an
       approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global
       alignment.

OPTIONS

       -s -gpo -gapopen -gap_open x
	   Gap open penalty .

       -e -gpe -gap_ext -gapextension x
	   Gap extension penalty.

       -t -tgpe -terminal_gap_extension_penalty x
	   Terminal gap penalties.

       -m -bonus -matrix_bonus x
	   A constant added to the substitution matrix.

       -c -sort <input, tree, gaps.>
	   The order in which the sequences appear in the output alignment.

       -g -feature
	   Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A?

       -same_feature_score
	   Score for aligning same features.

       -diff_feature_score
	   Penalty for aligning different features.

       -d -distance <wu, pair>
	   Distance method

       -b -tree -guide-tree <nj, upgma>
	   Guide tree method.

       -z -zcutoff
	   Parameter used in the wu-manber based distance calculation.

       -i -in -input
	   Name of the input file.

       -o -out -output
	   Name of the output file.

       -a -gap_inc
	   Increases gap penalties depending on the number of existing gaps.

       -f -format <fasta, msf, aln, clu, macsim>
	   The output format.

       -q -quiet
	   Print nothing to STDERR. Read nothing from STDIN.

REFERENCES

       o   Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics
	   6:298

       o   Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide
	   sequences allowing external features. Nucleic Acid Research 3:858?865.

AUTHORS

       Timo Lassmann <timolassmann@gmail.com>
	   Upstream author of Kalign.

       Charles Plessy <plessy@debian.org>
	   Wrote the manpage.

COPYRIGHT

       Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann

       Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the
       Free Software Foundation.

       This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is
       granted to copy, distribute and/or modify this document under the same terms as kalign itself.

       On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2.

kalign 2.04							 February 25, 2009							 KALIGN(1)