counting characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting counting characters
# 1  
Old 07-02-2010
counting characters

Hi All,
I need some help in counting the number of letters in a big file with separations.

Following is the file I have

Code:
>AB_1
MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM
IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII
RRIKRDMMERGRSLESIIDQYTSVVKPMYHQFIEPSKRYADIVIPEGVSNVVAIDVINSKIASILGEV
>AB_2
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE
>AB_3
MTGYDDFNYALSALKLGADDYLLKPFSKADVEDMLGKLRKKLELSKKTETIQELVEQPQKEVSAIAMAIH
ERLADSDLTLKSLAQQLGFSPNYLSVLIKKELGMPFQDYLVQERLKKAKLFLLTSNLKIYEIAEQVGFED
MNYFSQRFKQLVGVTPSQYKKGGQA

Likewise it goes down.

I would like to count the number of alphabeths each subsections have (subsections are separated with the header starting with >).

It would be great if I could get an tab delimited output file in the following format:
AB_1 number of alphabets
AB_2 number of alphabets
AB_3 number of alphabets

I don't want to count the letters on the header.

Also it would be great if we could omit the ">" on the output file.

Please let me know the best way to do it using awk or sed

LA
# 2  
Old 07-02-2010
Code:
awk '/^>/{sub(">","");p=$0;next}{a[p]}+=length($0)}END{for (i in a) printf "%s\t%s\n",i,a[i]}' file

# 3  
Old 07-02-2010
Code:
 nawk '/^>/ {if(n) print n,l;n=substr($0,2);next} {l+=length}END{print n,l}' myFile

# 4  
Old 07-02-2010
Thanks.

When I did it I got the following error message

Code:
awk: syntax error at source line 1
 context is
     >>> /^>/{sub(">","");p=$0;next}{a[p]}+= <<< 
    extra }
awk: bailing out at source line 1

Please Let me know what might have occurred.

LA

---------- Post updated at 11:12 AM ---------- Previous update was at 11:01 AM ----------

Thanks Image vgersh99,

It didn't give any error message but I think its not giving me what I needed.

Code:
awk '/^>/ {if(n) print n,l;n=substr($0,2);next} {l+=length}END{print n,l}' sample1.txt 
AB_1 208
AB_2 544
AB_3 709

Obviously AB_3 have less number of sequences than AB_2
# 5  
Old 07-02-2010
sorry......
Code:
nawk '/^>/ {if(n) print n,l;n=substr($0,2);l=0;next} {l+=length}END{print n,l}'

# 6  
Old 07-02-2010
Thanks this work.

It would be nice If I could get the output in a tab delimited format

LA
# 7  
Old 07-02-2010
Quote:
Originally Posted by Lucky Ali
Thanks this work.

It would be nice If I could get the output in a tab delimited format

LA
Code:
nawk '/^>/ {if(n) print n,l;n=substr($0,2);l=0;next} {l+=length}END{print n,l}' OFS='\t' myFile

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting characters at each position

Hi All, here's a question from newbie I have a data like this, which set of small DNA sequences separated by new line GAATCCGGAAACAGCAACTTCAAANCA GTNATTCGGGCCAAACTGTCGAA TTNGGCAACTGTTAGAGCTCATGCGACA CCTGCTAAACGAGTTCGAGTTGAANGA TTNCGGAAGTGGTCGCTGGCACGG ACNTGCATGTACGGAGTGACGAAACCI... (6 Replies)
Discussion started by: amits22
6 Replies

2. Shell Programming and Scripting

Counting characters vertically

I do have a big file in the following format >A1 ATGCGG >A2 TCATGC >A3 -TGCTG The number of characters will be same under each subheader and only possible characters are A,T,G,C and - I want to count the number of A's, T's,G's, C's & -'s vertically for all the positions so that I... (5 Replies)
Discussion started by: Lucky Ali
5 Replies

3. Shell Programming and Scripting

Counting the number of characters

Hi all, Can someone help me in getting the following o/p I/p:... (7 Replies)
Discussion started by: Sri3001
7 Replies

4. Shell Programming and Scripting

Counting characters within a file

Ok say I wanted to count every Y in a data file. Then set Y as my delimiter so that I can separate my file by taking all the contents that occur BEFORE the first Y and store them in a variable so that I may use this content later on in my program. Then I could do the same thing with the next Y's... (5 Replies)
Discussion started by: puttster
5 Replies

5. Shell Programming and Scripting

taking characters and counting them

Nevermind, I figured out a way using the sed command. But I forget the basic way of counting characters within a variable :( (4 Replies)
Discussion started by: puttster
4 Replies

6. Shell Programming and Scripting

Counting characters with sed

Input: ghw//yw/hw///??u How can i count the slashes("/") using sed? (13 Replies)
Discussion started by: cola
13 Replies

7. UNIX for Dummies Questions & Answers

counting the occurence of particular characters

I want to list the occurence of particular characters in a line. my file looks like this a,b,c,d e,f,g h,y:e,g,y s f;g,s,w and I want to count how many commas are in each line so the file in the end looks like this: a,b,c,d 3 e,f,g 2 h,y:e,g,y s 3 f;g,s,w ... (2 Replies)
Discussion started by: Audra
2 Replies

8. Shell Programming and Scripting

Counting characters between comma's

I have a comma delimited file that roughly has 300 fields. Not all fields are populated. This file is fed into another system, what I need to do is count the amount of characters in each field and give me an output similiar to this: 1 - 6,2 - 25 The first number is the field and the second... (2 Replies)
Discussion started by: dbrundrett
2 Replies

9. Shell Programming and Scripting

counting characters

Dears, I would like to count the number of "(" and ")" that occur in a file. (syntax checking script). I tried to use "grep -c" and this works fine as long as there is only one character (for which I do a search) on a line. Has anyone an idea how I can count the number of specific characters... (6 Replies)
Discussion started by: plelie2
6 Replies
Login or Register to Ask a Question