I could calculate the length of entire fasta sequences by following command,
But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be printed in a csv file.
Therefore, please help me to do the same.
Thanks in advance.
I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Hi,
I am having a file of dna sequences in fasta format which look like this:
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat
with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
I have two files. File1 is shown below.
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Hello,
I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
I have a fasta file as follows
>sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3
MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN
TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM
KGVTSTRVYERA
>sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Hi
How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this:
>H8V34IS02I59VP
SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG
YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Hi,
I have a fasta file with multiple sequences. How can i get only unique sequences from the file.
For example
my_file.fasta
>seq1
TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC
>seq2... (3 Replies)
I have this file:
>ID1
AA
>ID2
TTTTTT
>ID-3
AAAAAAAAA
>ID4
TTTTTTGGAGATCAGTAGCAGATGACAG-GGGGG-TGCACCCC
Add I am trying to use this script to output sequences longer than 15 characters:
sed -r '/^>/N;{/^.{,15}$/d}'
The desire output would be this:
>ID4... (8 Replies)
I have a fasta file as follows
>sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3
MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM
ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC
NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Hi,
I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below
test.fasta
>TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1... (1 Reply)
Discussion started by: dineshkumarsrk
1 Replies
LEARN ABOUT DEBIAN
bp_flanks
BP_FLANKS(1p) User Contributed Perl Documentation BP_FLANKS(1p)NAME
flanks - finding flanking sequences for a variant in a sequence position
SYNOPSIS
flanks --position POS [-p POS ...] [--flanklen INT]
accession | filename
DESCRIPTION
This script allows you to extract a subsequence around a region of interest from an existing sequence. The output if fasta formatted
sequence entry where the header line contains additional information about the location.
OPTIONS
The script takes one unnamed argument which be either a file name in the local file system or a nucleotide sequence accession number.
-p Position uses simple nucleotide sequence feature table
--position notation to define the region of interest, typically a
SNP or microsatellite repeat around which the flanks are
defined.
There can be more than one position option or you can
give a comma separated list to one position option.
The format of a position is:
[id:] int | range | in-between [-]
The optional id is the name you want to call the new
sequence. If it not given in joins running number to the
entry name with an underscore.
The position is either a point (e.g. 234), a range (e.g
250..300) or insertion point between nucleotides
(e.g. 234^235)
If the position is not completely within the source
sequence the output sequence will be truncated and it
will print a warning.
The optional hyphen [-] at the end of the position
indicates that that you want the retrieved sequence to be
in the opposite strand.
-f Defaults to 100. This is the length of the nucleotides
--flanklen sequence retrieved on both sides of the given position.
If the source file does not contain
OUTPUT FORMAT
The output is a fasta formatted entry where the description file contains tag=value pairs for information about where in the original
sequence the subsequence was taken.
The ID of the fasta entry is the name given at the command line joined by hyphen to the filename or accesion of the source sequence. If no
id is given a series of consequtive integers is used.
The tag=value pairs are:
oripos=int
position in the source file
strand=1|-1
strand of the sequence compared to the source sequence
allelepos=int
position of the region of interest in the current entry. The tag is the same as used by dbSNP@NCBI
The sequence highlights the allele variant position by showing it in upper case and rest of the sequence in lower case characters.
EXAMPLE
% flanks ~/seq/ar.embl
>1_/HOME/HEIKKI/SEQ/AR.EMBL oripos=100 strand=1 allelepos=100
taataactcagttcttatttgcacctacttcagtggacactgaatttggaaggtggagga
ttttgtttttttcttttaagatctgggcatcttttgaatCtacccttcaagtattaagag
acagactgtgagcctagcagggcagatcttgtccaccgtgtgtcttcttctgcacgagac
tttgaggctgtcagagcgct
TODO
The input files are assumed to be in EMBL format and the sequences are retrieved only from the EMB database. Make this more generic and use
the registry.
head1 FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the
Bioperl mailing lists Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the
web:
https://redmine.open-bio.org/projects/bioperl/
AUTHOR - Heikki Lehvaslaiho
Email: <heikki-at-bioperl-dot-org>
perl v5.14.2 2012-03-02 BP_FLANKS(1p)