Not sure how you identify your header; anyway, to break that line down into 60 characters per line and assuming that a line with more than 20 characters must be such a data line and with 1-20 characters should get that > in front of it:
I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:
> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT
I know I can extract the IDs alone with the following code
grep... (3 Replies)
I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this:
>SeqID1
AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC
TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT
>Sequence 22... (13 Replies)
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to... (2 Replies)
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
Hi,
I need some help with modifying fasta headers.
I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file.
File 1 contains the fasta sequences:
>contig0001 length=11115 numreads=10777
agatgtagatctct... (6 Replies)
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
I could calculate the length of entire fasta sequences by following command,
awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta
But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
I have two fasta files as shown below,
File:1
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG
>Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
LEARN ABOUT ULTRIX
ptx
ptx(1) General Commands Manual ptx(1)Name
ptx - create permuted index
Syntax
ptx [option...] [input[output]]
Description
The command generates a permuted index to file input on file output (standard input and output default). It has three phases: the first
does the permutation, generating one line for each keyword in an input line. The keyword is rotated to the front. The permuted file is
then sorted. Finally, the sorted lines are rotated so the keyword comes at the middle of the page. The command produces output in the
form:
.xx "tail" "before keyword" "keyword and after" "head"
where .xx may be an or macro for user-defined formatting. The before keyword and keyword and after fields incorporate as much of the line
as fits around the keyword when it is printed at the middle of the page. The and commands, at least one of which is an empty string "",
are wrapped-around pieces small enough to fit in the unused space at the opposite end of the line. When original text must be discarded,
`/' marks the spot.
Options
The following options can be applied:
-b break Use the characters in the break file as separators. In any case, tab, new line, and space characters are always used
as break characters.
-f Folds upper and lower case letters for sorting.
-g n Uses specified number as interfield gap. The default gap is 3 characters.
-i ignore Do not use as keywords any words given in the ignore file. If the -i and -o options are missing, use /usr/lib/eign as
the ignore file.
-o only Use words listed only in the only file.
-r Uses leading nonblanks as reference identifiers. Attach that identifier as a 5th field on each output line.
-t Prepares the output for the phototypesetter. The default line length is 100 characters.
-w n Use the next argument, n, as the width of the output line. The default line length is 72 characters.
Restrictions
Line length counts do not account for overstriking or proportional spacing.
Files
/usr/bin/sort
/usr/lib/eign
ptx(1)