Sponsored Content
Full Discussion: fasta format?
Top Forums UNIX for Dummies Questions & Answers fasta format? Post 302281093 by zaxxon on Wednesday 28th of January 2009 07:27:31 AM
Old 01-28-2009
Not sure how you identify your header; anyway, to break that line down into 60 characters per line and assuming that a line with more than 20 characters must be such a data line and with 1-20 characters should get that > in front of it:

Code:
sed -e '/^.\{20,\}$/ { s/.\{60\}/&\n/g}; /^.\{1,20\}$/ { s/^.*$/>&/}' infile
>DRB1_010101
LKLPGGSCMTALTVTLMVLSSPLALAGDTRPRFLWQLKFECHFFNGTERVRLLERCIYNQ
EESVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRR
VEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWT
FQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSKMLSGVGGFVLGLLFLG
AGLFIYFRNQKGHSGLQPTGFLS

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

grep FASTA files

I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this: > Seq I ACGACTAGACGATAGACGATAGA > Seq 2 ACGATGACGTAGCAGT > Seq 3 ACGATACGAT I know I can extract the IDs alone with the following code grep... (3 Replies)
Discussion started by: Xterra
3 Replies

2. Shell Programming and Scripting

Changing from FASTA to PHYLIP format

I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this: >SeqID1 AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT >Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies

3. UNIX for Dummies Questions & Answers

renaming (renumbering) fasta files

I have a fasta file that looks like this: >Noname ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG >Noname ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA ... I want to... (2 Replies)
Discussion started by: Oyster
2 Replies

4. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

5. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

6. UNIX for Dummies Questions & Answers

Fasta header modification

Hi, I need some help with modifying fasta headers. I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file. File 1 contains the fasta sequences: >contig0001 length=11115 numreads=10777 agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies

7. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

8. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

9. UNIX for Beginners Questions & Answers

How to count the length of fasta sequences?

I could calculate the length of entire fasta sequences by following command, awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
Discussion started by: dineshkumarsrk
14 Replies

10. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
ptx(1)							      General Commands Manual							    ptx(1)

Name
       ptx - create permuted index

Syntax
       ptx [option...] [input[output]]

Description
       The  command  generates	a permuted index to file input on file output (standard input and output default).  It has three phases: the first
       does the permutation, generating one line for each keyword in an input line.  The keyword is rotated to the front.  The	permuted  file	is
       then  sorted.   Finally,  the  sorted lines are rotated so the keyword comes at the middle of the page.	The command produces output in the
       form:

	      .xx "tail" "before keyword" "keyword and after" "head"

       where .xx may be an or macro for user-defined formatting.  The before keyword and keyword and after fields incorporate as much of the  line
       as  fits  around  the keyword when it is printed at the middle of the page.  The and commands, at least one of which is an empty string "",
       are wrapped-around pieces small enough to fit in the unused space at the opposite end of the line.  When original text must  be	discarded,
       `/' marks the spot.

Options
       The following options can be applied:

       -b break 	   Use	the  characters in the break file as separators.  In any case, tab, new line, and space characters are always used
			   as break characters.

       -f		   Folds upper and lower case letters for sorting.

       -g n		   Uses specified number as interfield gap.  The default gap is 3 characters.

       -i ignore	   Do not use as keywords any words given in the ignore file.  If the -i and -o options are missing, use /usr/lib/eign	as
			   the ignore file.

       -o only		   Use words listed only in the only file.

       -r		   Uses leading nonblanks as reference identifiers.  Attach that identifier as a 5th field on each output line.

       -t		   Prepares the output for the phototypesetter.  The default line length is 100 characters.

       -w n		   Use the next argument, n, as the width of the output line.  The default line length is 72 characters.

Restrictions
       Line length counts do not account for overstriking or proportional spacing.

Files
       /usr/bin/sort
       /usr/lib/eign

																	    ptx(1)
All times are GMT -4. The time now is 02:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy