Sponsored Content
Full Discussion: grep FASTA files
Top Forums UNIX for Dummies Questions & Answers grep FASTA files Post 302430266 by Xterra on Thursday 17th of June 2010 03:50:17 AM
Old 06-17-2010
pseudocoder

Would it be a way to do the same with bash? It will be easier for me to understand.
I was wondering if there is any way to calculate the frequency of each sequence? In other words, let assume that after 'trimming' the sequences there are several that are identical, would it be possible to determine the frequency and include it as part of the ID line? Something like this:

Quote:
> Seq A Freq 50
AGAGATAGATAGAGCTGAT
> Seq B Freq 25
AGAGATAGATAGAGCTGAT
> Seq C Freq 25
AGAGATAGATAGAGCTGAT


Thanks

Last edited by Xterra; 06-17-2010 at 05:00 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

fasta format?

Hi, I'm in need of creating a file in the fasta format: >1A6A.A HVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITN VPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCR VEHWGLDEPLLKHWEF >1A6A.B ... (5 Replies)
Discussion started by: lost
5 Replies

2. Shell Programming and Scripting

grep for certain files using a file as input to grep and then move

Hi All, I need to grep few files which has words like the below in the file name , which i want to put it in a file and and grep for the files which contain these names and move it to a new directory , full file name -C20091210.1000-20091210.1100_SMGBSC3:1000... (2 Replies)
Discussion started by: anita07
2 Replies

3. Shell Programming and Scripting

Changing from FASTA to PHYLIP format

I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this: >SeqID1 AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT >Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies

4. UNIX for Dummies Questions & Answers

renaming (renumbering) fasta files

I have a fasta file that looks like this: >Noname ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG >Noname ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA ... I want to... (2 Replies)
Discussion started by: Oyster
2 Replies

5. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

6. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

7. UNIX for Dummies Questions & Answers

Fasta header modification

Hi, I need some help with modifying fasta headers. I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file. File 1 contains the fasta sequences: >contig0001 length=11115 numreads=10777 agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies

8. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

9. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

10. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
ASN2FSA(1)						     NCBI Tools User's Manual							ASN2FSA(1)

NAME
asn2fsa - convert biological sequence data from ASN.1 to FASTA SYNOPSIS
asn2fsa [-] [-A acc] [-D] [-E] [-H] [-L filename] [-T] [-a type] [-b] [-c] [-d path] [-e N] [-f path] [-g] [-h filename] [-i filename] [-k] [-l] [-m] [-o filename] [-p path] [-q filename] [-r] [-s] [-u] [-v filename] [-x str] [-z] DESCRIPTION
asn2fsa converts biological sequence data from ASN.1 to FASTA. OPTIONS
A summary of options is included below. - Print usage message -A acc Accession to fetch -D Use Dash for Gap -E Extended Seq-ids -H HTML spans -L filename Log file -T Use Threads -a type Input ASN.1 type: a Automatic (default) z Any e Seq-entry b Bioseq s Bioseq-set m Seq-submit t batch processing (suitable for official releases; autodetects specific type) -b Bioseq-set is Binary -c Bioseq-set is Compressed -d path Path to ReadDB Database -e N Line length (70 by default; may range from 10 to 120) -f path Path to indexed FASTA data -g Expand delta gaps into Ns -h filename Far component cache output file name -i filename Single input file (standard input by default) -k Local fetching -l Lock components in advance -m Master style for near segmented sequences -o filename Nucleotide Output file name -p path Path to ASN.1 Files -q filename Quality score output file name -r Remote fetching from NCBI -s Far genomic contig for quality scores -u Recurse -v filename Protein output file name -x str File selection substring (.ent by default) [String] -z Print quality score gap as -1 AUTHOR
The National Center for Biotechnology Information. SEE ALSO
asn2all(1), asn2asn(1), asn2ff(1), asn2gb(1), asn2xml(1), asndhuff(1). NCBI
2011-09-02 ASN2FSA(1)
All times are GMT -4. The time now is 07:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy