01-28-2009
I've written it in GNU sed; maybe you got one your box or can install one. Maybe someone else knows that has to be tweaked so it runs with other versions of sed, sorry.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:
> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT
I know I can extract the IDs alone with the following code
grep... (3 Replies)
Discussion started by: Xterra
3 Replies
2. Shell Programming and Scripting
I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this:
>SeqID1
AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC
TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT
>Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies
3. UNIX for Dummies Questions & Answers
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to... (2 Replies)
Discussion started by: Oyster
2 Replies
4. UNIX for Dummies Questions & Answers
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies
5. Shell Programming and Scripting
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I need some help with modifying fasta headers.
I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file.
File 1 contains the fasta sequences:
>contig0001 length=11115 numreads=10777
agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies
7. UNIX for Dummies Questions & Answers
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies
8. Shell Programming and Scripting
Input File:
>Seq1
ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD
>Seq2
SDASDAQEQWEQeqAdfaasd
>Seq3
ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG
......
Desired Output File
>Seq1
ASDADAFASF
ASFADGSDGF
SDFSDFSDFS
DFSDFSDFSD
FSDFSDFSDF
SD
>Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies
9. UNIX for Beginners Questions & Answers
I could calculate the length of entire fasta sequences by following command,
awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta
But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
Discussion started by: dineshkumarsrk
14 Replies
10. UNIX for Beginners Questions & Answers
I have two fasta files as shown below,
File:1
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG
>Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
LEARN ABOUT DEBIAN
reprof
REPROF(1) User Commands REPROF(1)
NAME
reprof - predict protein secondary structure and solvent accessibility
SYNOPSIS
reprof -i [query.blastPsiMat] [OPTIONS]
reprof -i [query.fasta] [OPTIONS]
reprof -i [query.blastPsiMat|query.fasta] --mutations [mutations.txt] [OPTIONS]
DESCRIPTION
Predict protein secondary structure and solvent accessibility.
Output Format
The output format is self-explanatory, i.e. the colums of the output are described in the output file itself.
OPTIONS
-i, --input=FILE
Input BLAST PSSM matrix file (from Blast -Q option) or input (single) FASTA file.
-o, --out=FILE
Either an output file or a directory. If not provided or a directory, the suffix of the input filename (i.e. .fasta or .blastPsiMat) is
replaced to create an output filename.
--mutations=[all|FILE]
Either the keyword "all" to predict all possible mutations or a file containing mutations one per line such as "C12M" for C is mutated
to M on position 12:
C30Y
R31W
G48D
This mutation code is also attached to the output filename using "_". An additional file ending "_ORI" contains the prediction using
no evolutionary information even if a BLAST PSSM matrix was provided.
--modeldir=DIR
Directory where the model and feature files are stored. Default: /usr/share/reprof.
AUTHOR
Peter Hoenigschmid hoenigschmid@rostlab.org, Burkhard Rost
EXAMPLES
Prediction from BLAST PSSM matrix for best results:
reprof -i /usr/share/doc/reprof/examples/example.Q -o /tmp/example.Q.reprof
Prediction from FASTA file:
reprof -i /usr/share/doc/reprof/examples/example.fasta -o /tmp/example.fasta.reprof
Prediction from BLAST PSSM matrix file using the mutation mode:
reprof -i /usr/share/doc/reprof/examples/example.Q -o /tmp/mutations_example.Q.reprof --mutations /usr/share/doc/reprof/examples/mutations.txt
# Result files for the above call are going to be:
# /tmp/mutations_example.Q.{reprof,reprof_F172P,reprof_M1Q,reprof_N34Y,reprof_ORI} - see --mutations for a description of the extensions.
COPYRIGHT
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
BUGS
https://rostlab.org/bugzilla3/enter_bug.cgi?product=reprof
SEE ALSO
blast2(1)
http://rostlab.org/
1.0.1 2012-01-13 REPROF(1)