Alphabet counting


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Alphabet counting
# 1  
Old 01-26-2012
Alphabet counting

I have a text file in the following format
Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCC
AAAATAAAAAAAAAAAaAAAAAAAAAAAAAAA
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTT
TTTT-TTTTTTTTTCTtTTTTTTTTTTTTTTT

Each row/line will have 32 letters and each line will only have multiple occurrences of 2 letters out of a pool of ATGC (also small atgc). some may have also '-'. I would like to count the occurrence of each alphabet in a line and output the position number/ numbers of the smallest counted alphabet.

Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCC  G 7
AAAATAAAAAAAAAAAaAAAAAAAAAAAAAAA   T 5
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTT  G 2 15 16
TTTT-TTTTTTTTTCTtTTTTTTTTTTTTTTT    C 15

Please let me know the best way to do it using awk.
Thanks
# 2  
Old 01-26-2012
Have a go with this:

Code:
awk '
    {
        n = split( $0, a, "" );
        for( i = 1; i <= n; i++ )
        {
            count[a[i]]++;
            pos[a[i]] = sprintf( "%s%d ", pos[a[i]], i );
        }

        min = "";
        for( x in count )
        {
            if( match( x, "[ACGT]" ) && (min == "" || count[x] < count[min] ) )
                min = x;
        }

        print $0, min, pos[min];

        delete count;
        delete pos;
    }
' input-file >output-file


Last edited by agama; 01-26-2012 at 11:28 PM.. Reason: better form
This User Gave Thanks to agama For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell Script - Alphabet in code

Hi e Hi everyone, I can't make this script work, #! /bin/bash declare -A crypt=( ="A" ="a" ="B" ="b" ="C" ="c" =' ' ='!' ) encode () { local word=$1 for ((i=0; i<${#word}; ++i)) ; do local char=${word:$i:1} printf %s' ' ${crypt} done ... (5 Replies)
Discussion started by: Pinguino
5 Replies

2. Shell Programming and Scripting

Conditional for every letter in alphabet

I wanted to know if there was a more efficient to do this. I was to setup a conditional for every letter of the alphabet, like so (I am parsing an array): for i in "${arr}"; do if ]; then echo "$i starts with A" else echo "$i does not start with A" fi done I want to do this A-Z, is there... (6 Replies)
Discussion started by: sudo
6 Replies

3. Shell Programming and Scripting

Help with sort alphabet on specific column

Input file: POL B7U6K8 Avian_reticuloendotheliosis_virus POLB B7Z1W5 Homo_sapiens POLB H9G5Y0 Anolis_carolinensis POLD1 Q642R8 Xenopus_laevis POLD2 H0YZC7 Taeniopygia_guttata POLD3 F1P540 Gallus_gallus POLDIP3 Q5F4B6 Gallus_gallus POLE2 E1C2T8 Gallus_gallus... (3 Replies)
Discussion started by: perl_beginner
3 Replies

4. Shell Programming and Scripting

Recode alphabet into numbers

I have a genotype.bim file where it contains information about SNPs and genotype. As a hypothetical example, let's say genotype.bim snp1 ... A G snp2 ... G T snp3 ... G T snp4 ... G A ... snpN ... C G where first column identifies each SNP and 5th and 6th column has genotype... (3 Replies)
Discussion started by: johnkim0806
3 Replies

5. Shell Programming and Scripting

last character is digit or alphabet!

Hello, I have to find out whether the last character is digit or alphabet. I manage to strip the last character but would need some help if there is one liner available to test the above. set x = WM echo $x | sed 's/.*\(.$\)/\1/' O/P M I would like a one liner code to test whether the... (1 Reply)
Discussion started by: dixits
1 Replies

6. UNIX for Dummies Questions & Answers

poly to mono alphabet for every 2nd line

Hi, Can anyone teach me by using perl. let say i have an input file that content like below: ->line_01 aaabbbDDDTTTUSSy ->line_02 cccdddEEESSSGTTT ->line_03 xxxxyyyyzzzzzzzzzz want the above input file content to become output file like below (every 2nd line after ->... become mono... (0 Replies)
Discussion started by: eisya10
0 Replies

7. UNIX for Dummies Questions & Answers

checking wether an input is using letters of the alphabet

afternoon forums. I need to get a way of testing as to wether an inputed character is part of the english alphabet. i have come up with the following code but its not working at all. until '] do echo This is not a Letter done any help would be beneficial to me. (1 Reply)
Discussion started by: strasner
1 Replies

8. Shell Programming and Scripting

To check if the first character is a alphabet or number

Hi, I need to find whether the first character in a line is a alphabet or a number. If its a number i should sort it numerically. If its a alphabet i should sort it based on the ASCII value.And if it is something other than alphabet or number then sort it based on ASCII value. The code i used... (2 Replies)
Discussion started by: ragavhere
2 Replies

9. Programming

output the letters of the alphabet with the number of occurrences

hi, I'm trying to create a program that will read a file and then check the file for each letter of the alphabet and then output the letter and the number of times it appears in the file, into a new file... this is what i have so far but it's not working.. if anyone could help that would be nice!... (10 Replies)
Discussion started by: svd
10 Replies

10. Shell Programming and Scripting

What can i do to check that the input is all alphabet.. ?

What can i do to check that the input is all alphabet.. ? (4 Replies)
Discussion started by: XXXXXXXXXX
4 Replies
Login or Register to Ask a Question