counting characters Post: 302434395

Sponsored Content

Top Forums Shell Programming and Scripting counting characters Post 302434395 by Lucky Ali on Friday 2nd of July 2010 10:45:00 AM

07-02-2010

Registered User

counting characters

Hi All,
I need some help in counting the number of letters in a big file with separations.

Following is the file I have

Code:

>AB_1
MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM
IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII
RRIKRDMMERGRSLESIIDQYTSVVKPMYHQFIEPSKRYADIVIPEGVSNVVAIDVINSKIASILGEV
>AB_2
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE
>AB_3
MTGYDDFNYALSALKLGADDYLLKPFSKADVEDMLGKLRKKLELSKKTETIQELVEQPQKEVSAIAMAIH
ERLADSDLTLKSLAQQLGFSPNYLSVLIKKELGMPFQDYLVQERLKKAKLFLLTSNLKIYEIAEQVGFED
MNYFSQRFKQLVGVTPSQYKKGGQA

Likewise it goes down.

I would like to count the number of alphabeths each subsections have (subsections are separated with the header starting with >).

It would be great if I could get an tab delimited output file in the following format:
AB_1 number of alphabets
AB_2 number of alphabets
AB_3 number of alphabets

I don't want to count the letters on the header.

Also it would be great if we could omit the ">" on the output file.

Please let me know the best way to do it using awk or sed

LA

Lucky Ali

View Public Profile for Lucky Ali

Find all posts by Lucky Ali

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

counting characters

Dears, I would like to count the number of "(" and ")" that occur in a file. (syntax checking script). I tried to use "grep -c" and this works fine as long as there is only one character (for which I do a search) on a line. Has anyone an idea how I can count the number of specific characters...

2. Shell Programming and Scripting

Counting characters between comma's

I have a comma delimited file that roughly has 300 fields. Not all fields are populated. This file is fed into another system, what I need to do is count the amount of characters in each field and give me an output similiar to this: 1 - 6,2 - 25 The first number is the field and the second...

3. UNIX for Dummies Questions & Answers

counting the occurence of particular characters

I want to list the occurence of particular characters in a line. my file looks like this a,b,c,d e,f,g h,y:e,g,y s f;g,s,w and I want to count how many commas are in each line so the file in the end looks like this: a,b,c,d 3 e,f,g 2 h,y:e,g,y s 3 f;g,s,w ...

4. Shell Programming and Scripting

Counting characters with sed

Input: ghw//yw/hw///??u How can i count the slashes("/") using sed?

5. Shell Programming and Scripting

taking characters and counting them

Nevermind, I figured out a way using the sed command. But I forget the basic way of counting characters within a variable :(

6. Shell Programming and Scripting

Counting characters within a file

Ok say I wanted to count every Y in a data file. Then set Y as my delimiter so that I can separate my file by taking all the contents that occur BEFORE the first Y and store them in a variable so that I may use this content later on in my program. Then I could do the same thing with the next Y's...

7. Shell Programming and Scripting

Counting the number of characters

Hi all, Can someone help me in getting the following o/p I/p:...

8. Shell Programming and Scripting

Counting characters vertically

I do have a big file in the following format >A1 ATGCGG >A2 TCATGC >A3 -TGCTG The number of characters will be same under each subheader and only possible characters are A,T,G,C and - I want to count the number of A's, T's,G's, C's & -'s vertically for all the positions so that I...

9. Shell Programming and Scripting

Counting characters at each position

Hi All, here's a question from newbie I have a data like this, which set of small DNA sequences separated by new line GAATCCGGAAACAGCAACTTCAAANCA GTNATTCGGGCCAAACTGTCGAA TTNGGCAACTGTTAGAGCTCATGCGACA CCTGCTAAACGAGTTCGAGTTGAANGA TTNCGGAAGTGGTCGCTGGCACGG ACNTGCATGTACGGAGTGACGAAACCI...

LEARN ABOUT MINIX

join

JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME

       join - relational database operator

SYNOPSIS

       join [-an] [-e s] [-o list] [-tc] file1 file2

DESCRIPTION

       Join  forms,  on the standard output, a join of the two relations specified by the lines of file1 and file2.  If file1 is `-', the standard
       input is used.

       File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the  first	in
       each line.

       There  is  one line in the output for each pair of lines in file1 and file2 that have identical join fields.  The output line normally con-
       sists of the common field, then the rest of the line from file1, then the rest of the line from file2.

       Fields are normally separated by blank, tab or newline.	In this case, multiple separators count as one, and leading  separators  are  dis-
       carded.

       These options are recognized:

       -an    In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.

       -e s   Replace empty output fields by string s.

       -o list
	      Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
	      field number.

       -tc    Use character c as a separator (tab character).  Every appearance of c in a line is significant.

SEE ALSO

       sort(1), comm(1), awk(1).

BUGS

       With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.

       The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.

7th Edition							  April 29, 1985							   JOIN(1)