I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Hi,
I am having a file of dna sequences in fasta format which look like this:
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat
with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
I have two files. File1 is shown below.
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Hello,
I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
I have a fasta file as follows
>sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3
MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN
TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM
KGVTSTRVYERA
>sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Hi
How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this:
>H8V34IS02I59VP
SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG
YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Hi,
I have a fasta file with multiple sequences. How can i get only unique sequences from the file.
For example
my_file.fasta
>seq1
TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC
>seq2... (3 Replies)
I have this file:
>ID1
AA
>ID2
TTTTTT
>ID-3
AAAAAAAAA
>ID4
TTTTTTGGAGATCAGTAGCAGATGACAG-GGGGG-TGCACCCC
Add I am trying to use this script to output sequences longer than 15 characters:
sed -r '/^>/N;{/^.{,15}$/d}'
The desire output would be this:
>ID4... (8 Replies)
I have a fasta file as follows
>sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3
MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM
ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC
NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Hi,
I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below
test.fasta
>TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1... (1 Reply)
Discussion started by: dineshkumarsrk
1 Replies
LEARN ABOUT DEBIAN
kalign
KALIGN(1) Kalign User Manual KALIGN(1)NAME
kalign - performs multiple alignment of biological sequences.
SYNOPSIS
kalign [infile.fasta] [outfile.fasta] [Options]
kalign [-i infile.fasta] [-o outfile.fasta] [Options]
kalign [< infile.fasta] [> outfile.fasta] [Options]
DESCRIPTION
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm,
to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an
approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global
alignment.
OPTIONS -s -gpo -gapopen -gap_open x
Gap open penalty .
-e -gpe -gap_ext -gapextension x
Gap extension penalty.
-t -tgpe -terminal_gap_extension_penalty x
Terminal gap penalties.
-m -bonus -matrix_bonus x
A constant added to the substitution matrix.
-c -sort <input, tree, gaps.>
The order in which the sequences appear in the output alignment.
-g -feature
Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A?
-same_feature_score
Score for aligning same features.
-diff_feature_score
Penalty for aligning different features.
-d -distance <wu, pair>
Distance method
-b -tree -guide-tree <nj, upgma>
Guide tree method.
-z -zcutoff
Parameter used in the wu-manber based distance calculation.
-i -in -input
Name of the input file.
-o -out -output
Name of the output file.
-a -gap_inc
Increases gap penalties depending on the number of existing gaps.
-f -format <fasta, msf, aln, clu, macsim>
The output format.
-q -quiet
Print nothing to STDERR. Read nothing from STDIN.
REFERENCES
o Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics
6:298
o Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide
sequences allowing external features. Nucleic Acid Research 3:858?865.
AUTHORS
Timo Lassmann <timolassmann@gmail.com>
Upstream author of Kalign.
Charles Plessy <plessy@debian.org>
Wrote the manpage.
COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann
Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the
Free Software Foundation.
This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is
granted to copy, distribute and/or modify this document under the same terms as kalign itself.
On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2.
kalign 2.04 February 25, 2009 KALIGN(1)