06-17-2010
pseudocoder
Would it be a way to do the same with bash? It will be easier for me to understand.
I was wondering if there is any way to calculate the frequency of each sequence? In other words, let assume that after 'trimming' the sequences there are several that are identical, would it be possible to determine the frequency and include it as part of the ID line? Something like this:
Quote:
> Seq A Freq 50
AGAGATAGATAGAGCTGAT
> Seq B Freq 25
AGAGATAGATAGAGCTGAT
> Seq C Freq 25
AGAGATAGATAGAGCTGAT
Thanks
Last edited by Xterra; 06-17-2010 at 05:00 AM..
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I'm in need of creating a file in the fasta format:
>1A6A.A
HVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITN
VPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCR
VEHWGLDEPLLKHWEF
>1A6A.B ... (5 Replies)
Discussion started by: lost
5 Replies
2. Shell Programming and Scripting
Hi All, I need to grep few files which has words like the below in the file name , which i want to put it in a file and and grep for the files which contain these names and move it to a new directory ,
full file name -C20091210.1000-20091210.1100_SMGBSC3:1000... (2 Replies)
Discussion started by: anita07
2 Replies
3. Shell Programming and Scripting
I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this:
>SeqID1
AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC
TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT
>Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies
4. UNIX for Dummies Questions & Answers
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to... (2 Replies)
Discussion started by: Oyster
2 Replies
5. UNIX for Dummies Questions & Answers
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies
6. UNIX for Dummies Questions & Answers
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies
7. UNIX for Dummies Questions & Answers
Hi,
I need some help with modifying fasta headers.
I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file.
File 1 contains the fasta sequences:
>contig0001 length=11115 numreads=10777
agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies
8. UNIX for Dummies Questions & Answers
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies
9. Shell Programming and Scripting
Input File:
>Seq1
ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD
>Seq2
SDASDAQEQWEQeqAdfaasd
>Seq3
ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG
......
Desired Output File
>Seq1
ASDADAFASF
ASFADGSDGF
SDFSDFSDFS
DFSDFSDFSD
FSDFSDFSDF
SD
>Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies
10. UNIX for Beginners Questions & Answers
I have two fasta files as shown below,
File:1
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG
>Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
LEARN ABOUT DEBIAN
asn2fsa
ASN2FSA(1) NCBI Tools User's Manual ASN2FSA(1)
NAME
asn2fsa - convert biological sequence data from ASN.1 to FASTA
SYNOPSIS
asn2fsa [-] [-A acc] [-D] [-E] [-H] [-L filename] [-T] [-a type] [-b] [-c] [-d path] [-e N] [-f path] [-g] [-h filename] [-i filename] [-k]
[-l] [-m] [-o filename] [-p path] [-q filename] [-r] [-s] [-u] [-v filename] [-x str] [-z]
DESCRIPTION
asn2fsa converts biological sequence data from ASN.1 to FASTA.
OPTIONS
A summary of options is included below.
- Print usage message
-A acc Accession to fetch
-D Use Dash for Gap
-E Extended Seq-ids
-H HTML spans
-L filename
Log file
-T Use Threads
-a type
Input ASN.1 type:
a Automatic (default)
z Any
e Seq-entry
b Bioseq
s Bioseq-set
m Seq-submit
t batch processing (suitable for official releases; autodetects specific type)
-b Bioseq-set is Binary
-c Bioseq-set is Compressed
-d path
Path to ReadDB Database
-e N Line length (70 by default; may range from 10 to 120)
-f path
Path to indexed FASTA data
-g Expand delta gaps into Ns
-h filename
Far component cache output file name
-i filename
Single input file (standard input by default)
-k Local fetching
-l Lock components in advance
-m Master style for near segmented sequences
-o filename
Nucleotide Output file name
-p path
Path to ASN.1 Files
-q filename
Quality score output file name
-r Remote fetching from NCBI
-s Far genomic contig for quality scores
-u Recurse
-v filename
Protein output file name
-x str File selection substring (.ent by default) [String]
-z Print quality score gap as -1
AUTHOR
The National Center for Biotechnology Information.
SEE ALSO
asn2all(1), asn2asn(1), asn2ff(1), asn2gb(1), asn2xml(1), asndhuff(1).
NCBI
2011-09-02 ASN2FSA(1)