Counting characters at each position


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Counting characters at each position
# 1  
Old 02-08-2013
Counting characters at each position

Hi All, here's a question from newbie

I have a data like this, which set of small DNA sequences separated by new line

HTML Code:
GAATCCGGAAACAGCAACTTCAAANCA
GTNATTCGGGCCAAACTGTCGAA
TTNGGCAACTGTTAGAGCTCATGCGACA
CCTGCTAAACGAGTTCGAGTTGAANGA
TTNCGGAAGTGGTCGCTGGCACGG
ACNTGCATGTACGGAGTGACGAAACC
I usually have to count frequency of each character in whole data, which I do with
Code:
awk -F "" '{ for ( i=1; i<=NF; i++) freq[$i]++} END {for (a in freq) print a, freq[a]}'

Now I am almost clueless when I need to count frequency of characters at each position, I am trying to present example with subset of data below
HTML Code:
GAATCCGGAAACAGCAACTTCAAANCA
GTNATTCGGGCCAAACTGTCGAA
TTNGGCAACTGTTAGAGCTCATGCGACA
CCTGCTAAACGAGTTCGAGTTGAANGA
TTNCGGAAGTGGTCGCTGGCACGG
         
1st position G = 1
T = 2
C =1
 A =1 
2nd position 
T=3
C=2 
so on
Any ideas, help is most appreciated. Please tell me if I am not clearly stating the problem.

Thank you,

Amit
# 2  
Old 02-08-2013
Try this as a starting point:
Code:
$ awk -F "" '     {for ( i=1; i<=NF;  i++) {freq[$i,i]++; Base[$i]} if (NF > max) max = NF}
             END  {for ( i=1; i<=max; i++)
                    {for (a in Base) print "Pos: ", i, ", Base: ", a, ", Freq: ", freq[a,i]}}
            ' file
Pos:  1 , Base:  A , Freq:  1
Pos:  1 , Base:  C , Freq:  1
Pos:  1 , Base:  G , Freq:  2
Pos:  1 , Base:  N , Freq:  
Pos:  1 , Base:  T , Freq:  2
Pos:  2 , Base:  A , Freq:  1
Pos:  2 , Base:  C , Freq:  2
Pos:  2 , Base:  G , Freq:  
Pos:  2 , Base:  N , Freq:  
Pos:  2 , Base:  T , Freq:  3
.
.
.

# 3  
Old 02-08-2013
Thank you so much RudiC, didn't know about this trick
Code:
{freq[$i,i]++; Base[$i]}

I understand its taking your time, could I request you to explain above part a bit.

Best,

Amit
# 4  
Old 02-08-2013
Actually, it's not a trick but more a detour born out of sheer despair. While awk (at least the one I use, mawk) does accept if ( (i,j) in freq ), it would not allow for for ( (i,j) in freq ) That's why I invented/introduced the second array, just to keep hands on the base chars.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 02-08-2013
Alternatively you could try:
Code:
awk '{for(i=1; i<=NF; i++) A[i OFS $i]++} END{for(i in A) print i, A[i]}' FS= file | sort -n

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 02-08-2013
Quote:
Originally Posted by Scrutinizer
Alternatively you could try:
Code:
awk '{for(i=1; i<=NF; i++) A[i OFS $i]++} END{for(i in A) print i, A[i]}' FS= file | sort -n

Thank you, very clever and concise. Sorry, I could not understand, if this
HTML Code:
A[i OFS $i]++
is creating another array.
# 7  
Old 02-08-2013
You're welcome. There is a single array. This adds 1 to an array element with a single index that consists of the position number and the kind separated by OFS (output field separator) which defaults to a single space. So for example A["3 N"]++ and A["28 A"]++

Last edited by Scrutinizer; 02-08-2013 at 01:29 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting value at a particular position without changing the position of other characters

Hi All, I wanted a sed/awk command to add a value/character on a particular position without disturbing the position of other characters. I have file a.txt OL 10031 Day Black Midi Good Value P01 P07 OL 10031 Day Black Short Good Value P01 P07 I want to get the output as... (2 Replies)
Discussion started by: rahulsk
2 Replies

2. Shell Programming and Scripting

Counting characters vertically

I do have a big file in the following format >A1 ATGCGG >A2 TCATGC >A3 -TGCTG The number of characters will be same under each subheader and only possible characters are A,T,G,C and - I want to count the number of A's, T's,G's, C's & -'s vertically for all the positions so that I... (5 Replies)
Discussion started by: Lucky Ali
5 Replies

3. Shell Programming and Scripting

Counting the number of characters

Hi all, Can someone help me in getting the following o/p I/p:... (7 Replies)
Discussion started by: Sri3001
7 Replies

4. Shell Programming and Scripting

Counting characters within a file

Ok say I wanted to count every Y in a data file. Then set Y as my delimiter so that I can separate my file by taking all the contents that occur BEFORE the first Y and store them in a variable so that I may use this content later on in my program. Then I could do the same thing with the next Y's... (5 Replies)
Discussion started by: puttster
5 Replies

5. Shell Programming and Scripting

Counting characters with sed

Input: ghw//yw/hw///??u How can i count the slashes("/") using sed? (13 Replies)
Discussion started by: cola
13 Replies

6. Shell Programming and Scripting

counting characters

Hi All, I need some help in counting the number of letters in a big file with separations. Following is the file I have >AB_1 MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

7. UNIX for Dummies Questions & Answers

counting the occurence of particular characters

I want to list the occurence of particular characters in a line. my file looks like this a,b,c,d e,f,g h,y:e,g,y s f;g,s,w and I want to count how many commas are in each line so the file in the end looks like this: a,b,c,d 3 e,f,g 2 h,y:e,g,y s 3 f;g,s,w ... (2 Replies)
Discussion started by: Audra
2 Replies

8. UNIX for Advanced & Expert Users

Counting position of a character

Hi All, I have a file of the format : idsfskjvfdznvdfjvh ierwjfkncmvlkmc xszkmdvnosndzjndf weuhrndzierfncv rndsjnsllshens iernzkfndslkdhf zkinewfinfvlkmvd I wish to count the occurrences of character 'z' in the file. I also need to find out the position of 'z' in various lines. and... (3 Replies)
Discussion started by: rochitsharma
3 Replies

9. Shell Programming and Scripting

Counting characters between comma's

I have a comma delimited file that roughly has 300 fields. Not all fields are populated. This file is fed into another system, what I need to do is count the amount of characters in each field and give me an output similiar to this: 1 - 6,2 - 25 The first number is the field and the second... (2 Replies)
Discussion started by: dbrundrett
2 Replies

10. Shell Programming and Scripting

counting characters

Dears, I would like to count the number of "(" and ")" that occur in a file. (syntax checking script). I tried to use "grep -c" and this works fine as long as there is only one character (for which I do a search) on a line. Has anyone an idea how I can count the number of specific characters... (6 Replies)
Discussion started by: plelie2
6 Replies
Login or Register to Ask a Question