Sponsored Content
Full Discussion: grep FASTA files
Top Forums UNIX for Dummies Questions & Answers grep FASTA files Post 302430266 by Xterra on Thursday 17th of June 2010 03:50:17 AM
Old 06-17-2010
pseudocoder

Would it be a way to do the same with bash? It will be easier for me to understand.
I was wondering if there is any way to calculate the frequency of each sequence? In other words, let assume that after 'trimming' the sequences there are several that are identical, would it be possible to determine the frequency and include it as part of the ID line? Something like this:

Quote:
> Seq A Freq 50
AGAGATAGATAGAGCTGAT
> Seq B Freq 25
AGAGATAGATAGAGCTGAT
> Seq C Freq 25
AGAGATAGATAGAGCTGAT


Thanks

Last edited by Xterra; 06-17-2010 at 05:00 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

fasta format?

Hi, I'm in need of creating a file in the fasta format: >1A6A.A HVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITN VPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCR VEHWGLDEPLLKHWEF >1A6A.B ... (5 Replies)
Discussion started by: lost
5 Replies

2. Shell Programming and Scripting

grep for certain files using a file as input to grep and then move

Hi All, I need to grep few files which has words like the below in the file name , which i want to put it in a file and and grep for the files which contain these names and move it to a new directory , full file name -C20091210.1000-20091210.1100_SMGBSC3:1000... (2 Replies)
Discussion started by: anita07
2 Replies

3. Shell Programming and Scripting

Changing from FASTA to PHYLIP format

I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this: >SeqID1 AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT >Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies

4. UNIX for Dummies Questions & Answers

renaming (renumbering) fasta files

I have a fasta file that looks like this: >Noname ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG >Noname ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA ... I want to... (2 Replies)
Discussion started by: Oyster
2 Replies

5. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

6. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

7. UNIX for Dummies Questions & Answers

Fasta header modification

Hi, I need some help with modifying fasta headers. I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file. File 1 contains the fasta sequences: >contig0001 length=11115 numreads=10777 agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies

8. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

9. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

10. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
FASTX_CLIPPER(1)						   User Commands						  FASTX_CLIPPER(1)

NAME
fastx_clipper - FASTA/Q Clipper DESCRIPTION
usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu) [-h] = This helpful help screen. [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter). [-l N] = discard sequences shorter than N nucleotides. default is 5. [-d N] = Keep the adapter and N bases after it. (using '-d 0' is the same as not using '-d' at all. which is the default). [-c] = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter). [-C] = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter). [-k] = Report Adapter-Only sequences. [-n] = keep sequences with unknown (N) nucleotides. default is to discard such sequences. [-v] = Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR. [-z] = Compress output with GZIP. [-D] = DEBUG output. [-M N] = require minimum adapter alignment length of N. If less than N nucleotides aligned with the adapter - don't clip it. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit http://hannonlab.cshl.edu/fastx_toolkit/commandline.html to get a better layout as well as an overview about connected FASTX tools. fastx_clipper 0.0.13.2 May 2012 FASTX_CLIPPER(1)
All times are GMT -4. The time now is 05:31 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy