editing headers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting editing headers
# 1  
Old 07-21-2011
editing headers

Hi,
I have a folder that contains many (multiple) files

1.fasta
2.fasta
3.fasta
4.fasta
5.fasta
.
.
100's of files

Each such file have data in the following format
for example:
vi 1.fasta

Code:
  58    390
A
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCGACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCGTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
B
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCCACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCTTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAAGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
C
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCATTCAATGCGCTTTTAAAA
ACCTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCTCATAAG
ATCCCACCTACAATCATTTCACGTTGTCAGCGCTTTGAATTCCGAAAAATATCAGTGAAT
GATATTGTTGAGAGATTATCAACGGTCGTGACAAATGAAGGTACGCAAGTGGAAGGTGAA
GCATTACAAATTGTTGCGCGTGCTGCCGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCTATATCTTATAGTGATGAGATTGTTACGACAGAAGATGTATTGGCCGTAACA
GGACGTGATATGTTCCGTAAGTTGAGTGAA
D
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAACCGCCAGGACATGTCATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATTATTTCGCGTTGCCAACGTTTCGAATTTCGAAAGATATCAGTAAAT
GATATTGTTGAGAGATTATCGACAGTTGTAAACAATGAAGGTACGCAAGTAGAAGATGAA
GCGTTACAAATCGTTGCACGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCAATATCTTATAGTGATGAGACTGTTACGACAGAAGATGTATTAGCTGTAACA
GGGCGTGATATGTTCCGAATGTTAAGTGAA

I need to edit the above files by adding two separate things into each of the file.
1. add 1 (number one) on the first line of each file ( 58 390 1).
2. add > inthe beginning of each of the headers ( >A >B >C >D.
Basically edit the above file and get the output as below:

Code:
  58    390 1
>A
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCGACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCGTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
>B
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCCACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCTTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAAGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
>C
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCATTCAATGCGCTTTTAAAA
ACCTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCTCATAAG
ATCCCACCTACAATCATTTCACGTTGTCAGCGCTTTGAATTCCGAAAAATATCAGTGAAT
GATATTGTTGAGAGATTATCAACGGTCGTGACAAATGAAGGTACGCAAGTGGAAGGTGAA
GCATTACAAATTGTTGCGCGTGCTGCCGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCTATATCTTATAGTGATGAGATTGTTACGACAGAAGATGTATTGGCCGTAACA
GGACGTGATATGTTCCGTAAGTTGAGTGAA
>D
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAACCGCCAGGACATGTCATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATTATTTCGCGTTGCCAACGTTTCGAATTTCGAAAGATATCAGTAAAT
GATATTGTTGAGAGATTATCGACAGTTGTAAACAATGAAGGTACGCAAGTAGAAGATGAA
GCGTTACAAATCGTTGCACGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCAATATCTTATAGTGATGAGACTGTTACGACAGAAGATGTATTAGCTGTAACA
GGGCGTGATATGTTCCGAATGTTAAGTGAA

Please let me know the best way to do this either using awk or sed.
LA
# 2  
Old 07-21-2011
Try:
Code:
awk 'NR==1{$0=$0" 1"}length==1{$0=">"$0}1' 1.fasta

# 3  
Old 07-21-2011
The first part worked (adding 1 in the first line) but the second part didn't work (adding > before the header)

---------- Post updated at 04:53 PM ---------- Previous update was at 04:51 PM ----------

also the headers not always have single letters. There are headers with more than 2 letters.
# 4  
Old 07-21-2011
Post examples of those headers.
# 5  
Old 07-21-2011
ADC8_
BC
ACR4
TYUI
# 6  
Old 07-21-2011
So they are always less than 6 characters? If so, then this might work:
Code:
awk 'NR==1{$0=$0" 1"}length<6&&NR>1{$0=">"$0}1' 1.fasta

# 7  
Old 07-22-2011
With the help od sed:

$ sed -e 's/58 390/58 390 1/' -e 's/^\(.\)$/\>&/' <filename>

58 390 1
>A
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCGACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCGTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
>B
GTATACATTATTGATGAAGTCCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATCATTTCGCGTTGCCAACGTTTCGAATTTCGAAAAATATCAGTAAAT
GATATTGTTGAGAGATTGTCCACGGTTGTGACTAATGAAGGTACGCAAGTAGAAGATGAG
GCTTTACAAATTGTTGCGCGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAAGCGATATCTTATAGTGATGAGAGGGTTACGACAGAAGATGTATTAGCTGTAACG
GGTCGTGATATGTTCCGTATGTTAAGTGAA
>C
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCATTCAATGCGCTTTTAAAA
ACCTTAGAAGAGCCGCCAGGACATGTTATCTTTATTTTGGCGACAACAGAACCTCATAAG
ATCCCACCTACAATCATTTCACGTTGTCAGCGCTTTGAATTCCGAAAAATATCAGTGAAT
GATATTGTTGAGAGATTATCAACGGTCGTGACAAATGAAGGTACGCAAGTGGAAGGTGAA
GCATTACAAATTGTTGCGCGTGCTGCCGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCTATATCTTATAGTGATGAGATTGTTACGACAGAAGATGTATTGGCCGTAACA
GGACGTGATATGTTCCGTAAGTTGAGTGAA
>D
GTATACATTATTGATGAAGTTCACATGCTTTCTATGGGTGCCTTCAATGCGCTTTTAAAA
ACGTTAGAAGAACCGCCAGGACATGTCATCTTTATTTTGGCGACAACAGAACCGCATAAG
ATACCGCCTACAATTATTTCGCGTTGCCAACGTTTCGAATTTCGAAAGATATCAGTAAAT
GATATTGTTGAGAGATTATCGACAGTTGTAAACAATGAAGGTACGCAAGTAGAAGATGAA
GCGTTACAAATCGTTGCACGTGCCGCTGAAGGTGGTATGCGTGATGCGCTTAGTCTTATT
GATCAGGCAATATCTTATAGTGATGAGACTGTTACGACAGAAGATGTATTAGCTGTAACA
GGGCGTGATATGTTCCGAATGTTAAGTGAA
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert vi editing to text editing

Dear Guru's I'm using Putty and want to edit a file. I know we generally use vi editor to do it. As I'm not good in using vi editor, I want to convert the vi into something like text pad. Is there any option in Putty to do the same ? Thanks for your response. Srini (6 Replies)
Discussion started by: thummi9090
6 Replies

2. UNIX for Dummies Questions & Answers

Email Headers

I'm trying to pick up some Unix SysAdmin skills on my own outside of work through the use of the "Unix and Linux System Administrators Handbook." I've found the exercises to be very beneficial, until I came to this.... "What path did the email take? To Whom was it addressed, and to whom was it... (0 Replies)
Discussion started by: ksmarine1980
0 Replies

3. Shell Programming and Scripting

Editing File Headers

Hey Guys, Absolute neewbie here. I am trying to see if it is possible to edit headers/meta-data of files in Mac OSX. I am basically trying to change an audio file header to read 16bit instead of 24bit. We have an issue with some of our software and it regularly exports 16bit audio files with... (3 Replies)
Discussion started by: andysuperaudiom
3 Replies

4. Shell Programming and Scripting

Editing headers

Hi, I have a folder that contains many (multiple) files 1.fasta 2.fasta 3.fasta 4.fasta 5.fasta . . 100's of files Each such file have data in the following format for example: vi 1.fasta >AB_1 200bp MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

5. Programming

c - problem with headers?

I have a simple program to create a poker deck, shuffle it and deal cards. Here it is: #include <stdio.h> #include <stdlib.h> #include <time.h> struct Card { char *face, *suit; }; void fillDeck (Card *deck, char *face, char *suit); void shuffle (Card *deck); void... (4 Replies)
Discussion started by: Luke Bonham
4 Replies

6. UNIX for Dummies Questions & Answers

Grep Headers

Hi! Trying to find string and then put the above Headers of corresponding fist line. After executing a Property command a get this output: SP/CH-CH Span Name Type TG Idle InUse OffHk OnHk Ring -------- ------------------------------ ------ ---- ----- ----- ----- 02/01-24 CARRIERSS7... (6 Replies)
Discussion started by: Joel_john
6 Replies

7. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

8. Shell Programming and Scripting

Remove text between headers while leaving headers intact

Hi, I'm trying to strip all lines between two headers in a file: ### BEGIN ### Text to remove, contains all kinds of characters ... Antispyware-Downloadserver.com (Germany)=http://www.antispyware-downloadserver.c om/updates/ Antispyware-Downloadserver.com #2... (3 Replies)
Discussion started by: Trones
3 Replies

9. Programming

headers of the query

when we are spooling query o/p to certain txt file,in that file how we can get headers in the query.(through unix shell scripting). for exmple q1="slect * from XXXXXX;"; sqlplus XXX/XXXX@XXXXX spool XXXX.txt $q1 spool off in the text file i want the headers of the query..... ... (0 Replies)
Discussion started by: bhagya.puccha
0 Replies

10. Programming

C Headers

Where can i get C/C++ headers for OS MINIX 2.0.3? (0 Replies)
Discussion started by: biosdos
0 Replies
Login or Register to Ask a Question