01-28-2009
I've written it in GNU sed; maybe you got one your box or can install one. Maybe someone else knows that has to be tweaked so it runs with other versions of sed, sorry.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:
> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT
I know I can extract the IDs alone with the following code
grep... (3 Replies)
Discussion started by: Xterra
3 Replies
2. Shell Programming and Scripting
I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this:
>SeqID1
AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC
TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT
>Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies
3. UNIX for Dummies Questions & Answers
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to... (2 Replies)
Discussion started by: Oyster
2 Replies
4. UNIX for Dummies Questions & Answers
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies
5. Shell Programming and Scripting
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I need some help with modifying fasta headers.
I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file.
File 1 contains the fasta sequences:
>contig0001 length=11115 numreads=10777
agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies
7. UNIX for Dummies Questions & Answers
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies
8. Shell Programming and Scripting
Input File:
>Seq1
ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD
>Seq2
SDASDAQEQWEQeqAdfaasd
>Seq3
ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG
......
Desired Output File
>Seq1
ASDADAFASF
ASFADGSDGF
SDFSDFSDFS
DFSDFSDFSD
FSDFSDFSDF
SD
>Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies
9. UNIX for Beginners Questions & Answers
I could calculate the length of entire fasta sequences by following command,
awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta
But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
Discussion started by: dineshkumarsrk
14 Replies
10. UNIX for Beginners Questions & Answers
I have two fasta files as shown below,
File:1
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG
>Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
LEARN ABOUT DEBIAN
pynast
VERSION:(1) User Commands VERSION:(1)
NAME
PyNAST - alignment of short DNA sequences
SYNOPSIS
pynast [options] {-i input_fp -t template_fp}
DESCRIPTION
[] indicates optional input (order unimportant) {} indicates required input (order unimportant)
Example usage:
pynast -i my_input.fasta -t my_template.fasta
OPTIONS
--version
show program's version number and exit
-h, --help
show this help message and exit
-t TEMPLATE_FP, --template_fp=TEMPLATE_FP
path to template alignment file [REQUIRED]
-i INPUT_FP, --input_fp=INPUT_FP
path to input fasta file [REQUIRED]
-v, --verbose
Print status and other information during execution [default: False]
-p MIN_PCT_ID, --min_pct_id=MIN_PCT_ID
minimum percent sequence identity to consider a sequence a match [default: 75.0]
-l MIN_LEN, --min_len=MIN_LEN
minimum sequence length to include in NAST alignment [default: 1000]
-m PAIRWISE_ALIGNMENT_METHOD, --pairwise_alignment_method=PAIRWISE_ALIGNMENT_METHOD
method for performing pairwise alignment [default: uclust]
-a FASTA_OUT_FP, --fasta_out_fp=FASTA_OUT_FP
path to store resulting alignment file [default: derived from input filepath]
-g LOG_FP, --log_fp=LOG_FP
path to store log file [default: derived from input filepath]
-f FAILURE_FP, --failure_fp=FAILURE_FP
path to store file of seqs which fail to align [default: derived from input filepath]
-e MAX_E_VALUE, --max_e_value=MAX_E_VALUE
Depreciated. Will be removed in PyNAST 1.2
-d BLAST_DB, --blast_db=BLAST_DB
Depreciated. Will be removed in PyNAST 1.2
SEE ALSO
http://pynast.sourceforge.net
Version: pynast 1.1 August 2011 VERSION:(1)