Help with counting string elements


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with counting string elements
# 1  
Old 04-07-2011
Lightbulb Help with counting string elements

Hi All,
I hv several files which have hundreds of lines each for example
Code:
>XYZ.abc01
NNNTCGGTNNNNNCCACACACMYACACACCCACACCCACSCARCAC

I'd like to exculde the first line beginning with ">" and then for the rest of the lines get a count for each string element. So for the above example I would like to get the following output:
Code:
A=11
C=19
G=2
M=1
N=8
R=1
S=1
T=2
Y=1
Length of XYZ.abc01=46

Can anyone enlighten on this and if there is a Perl way i'll appreciate that as I'm learning Perl
Cheers and hv a nice day Smilie
# 2  
Old 04-07-2011
I didn't try it (may require some fix) but maybe something like this
Code:
#!/bin/ksh
{ read a
while read a
do
    [[ -z $a ]] && continue
    b=${a#?}
    c=${a%$b}
    
    if [[ "$c" == '>' ]]
    then
        h=$b 
        continue
    else
        fold -w 1 <(echo $a) | sort | uniq -c | awk '{print $2"="$1}'
        echo "Length of $h=${#a}"
    fi
done } <infile

seems working :

Code:
# cat tst
dummy first line
>XYZ.abc01
NNNTCGGTNNNNNCCACACACMYACACACCCACACCCACSCARCAC
>XYYUIZ.abc01
NNNTCGGTCACACMYACACACCCACCACACMYACACACCCACNNNNNCCACACACMYACACACCCACACCCACSCARCAC
>XYYU.abc03
NNNTCGGTCACACMYACACACCCACCACACMYACACACCCACNCACCCACACCCACSCARCAC
>XYYUIZ.abc04
NNNTCGGTCACACMYACACACCCACCACACMYAACACACCCACACCCACSCARCAC
>XYYUIZ.abc05
NNNTCGGTCACACMYACACANNNCCACACACMYACACACCCACACCCACSCARCAC
>XYYUIZ.abc06
NNNTCGGTCACACMYACACAACACMYACACACCCACNNNNNCCACACACMYACACACCCACACCCACSCARCAC
>XYYUIZ.abc07
NNNTCGGTCACACMYACACACCACACACMYACACACCCACACCCACSCARCAC

Code:
# cat sc
#!/bin/ksh
{ read a
while read a
do
    [[ -z $a ]] && continue
    b=${a#?}
    c=${a%$b}

    if [[ "$c" == '>' ]]
    then
        h=$b
        continue
    else
        fold -w 1 <(echo $a) | sort | uniq -c | awk '{print $2"="$1}'
        echo "Length of $h=${#a}"
    fi
done } <tst

Code:
# ksh sc
A=11
C=19
G=2
M=1
N=8
R=1
S=1
T=2
Y=1
Length of XYZ.abc01=46
A=23
C=37
G=2
M=3
N=8
R=1
S=1
T=2
Y=3
Length of XYYUIZ.abc01=80
A=18
C=31
G=2
M=2
N=4
R=1
S=1
T=2
Y=2
Length of XYYU.abc03=63
A=17
C=26
G=2
M=2
N=3
R=1
S=1
T=2
Y=2
Length of XYYUIZ.abc04=56
A=16
C=24
G=2
M=2
N=6
R=1
S=1
T=2
Y=2
Length of XYYUIZ.abc05=56
A=22
C=32
G=2
M=3
N=8
R=1
S=1
T=2
Y=3
Length of XYYUIZ.abc06=74
A=16
C=24
G=2
M=2
N=3
R=1
S=1
T=2
Y=2
Length of XYYUIZ.abc07=53
#


Last edited by ctsgnb; 04-07-2011 at 07:57 AM..
This User Gave Thanks to ctsgnb For This Post:
# 3  
Old 04-07-2011
Try:
Code:
perl -aF// -lne 'if ($.==1){s/^>//;$n=$_}else{for $i (@F){$h{$i}++;$l++}}END{for $i (keys %h){print "$i=$h{$i}"}print "Length of $n=$l"}' file

This User Gave Thanks to bartus11 For This Post:
# 4  
Old 04-07-2011

We could make the output less longer putting letter stats in one line by changing this in the previous scripts:
Code:
fold -w 1 <(echo $a) | sort | uniq -c | awk '{print $2"="$1}' | xargs

Code:
# ksh sc
A=11 C=19 G=2 M=1 N=8 R=1 S=1 T=2 Y=1
Length of XYZ.abc01=46
A=23 C=37 G=2 M=3 N=8 R=1 S=1 T=2 Y=3
Length of XYYUIZ.abc01=80
A=18 C=31 G=2 M=2 N=4 R=1 S=1 T=2 Y=2
Length of XYYU.abc03=63
A=17 C=26 G=2 M=2 N=3 R=1 S=1 T=2 Y=2
Length of XYYUIZ.abc04=56
A=16 C=24 G=2 M=2 N=6 R=1 S=1 T=2 Y=2
Length of XYYUIZ.abc05=56
A=22 C=32 G=2 M=3 N=8 R=1 S=1 T=2 Y=3
Length of XYYUIZ.abc06=74
A=16 C=24 G=2 M=2 N=3 R=1 S=1 T=2 Y=2
Length of XYYUIZ.abc07=53

# 5  
Old 04-07-2011
Create an awk script awk_cmd
Code:
#!/bin/awk
{
        if($0~/^$/)
                next;
        if($0~/^>/)
        {
                label=$0;
                next;
        }
        for(x=1;x<=length($0);x++)
        {arr[substr($0,x,1)]++
        }
        for (i in arr)
        {
                print i":"arr[i];sum+=arr[i];arr[i]=0
        }
        print "Length of "label"="sum;sum=0
}

Execute :
Code:
awk -f awk_cmd inputfile

# 6  
Old 04-07-2011
Ruby(1.9+)
Code:
$ ruby -ne 'next if $.==1;$_.chomp.split(//).group_by{|x|x}.each{|x,y| puts "#{x}:#{y.count}"}' file
N:8
T:2
C:19
G:2
A:11
M:1
Y:1
S:1
R:1

# 7  
Old 04-07-2011
@ctsgnb
Thank you !

@Bartus
Thanks a lot for the Perl version. Could you please explain the use of -F switch and // after that ?
Cheers Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting elements in each record

Hello, I have a file such as below: 0 0 . . 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 1I want to count the number of 0 and 1 in each line (. represents no data) and print them into two columns, that is:... (3 Replies)
Discussion started by: Homa
3 Replies

2. Shell Programming and Scripting

Counting a string between 2 strings...

I have been working on this for a little while and have been unable to come to a solution. Any help would be appreciated. I am working on a UNIX workstation and have a 30-40 meg text file that I am working with. In my real file there is hundreds of Jobs. Example of input file; misc logging data... (1 Reply)
Discussion started by: ny_evan
1 Replies

3. Shell Programming and Scripting

Counting Instances of a String with AWK

I have a list of URLs and I want to be able to count the number of instances of addresses ending in a certain TLD and output and sort it like so. 5 bdcc.com 48 zrtzr.com 49 rvo.com Input is as so ync.org sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu sduzj.edu... (1 Reply)
Discussion started by: Pjstaab
1 Replies

4. UNIX for Dummies Questions & Answers

counting occurrence of characters in a string

Hello, I have a string like this 0:1:2:0:2:2:4:0:0:0:-200:500...... what i want is to break down how many different characters are there and their count. For example for above string it should display 0 - 5 times 1 - 1 times 2 - 3 times 4 - 1 times . . . I am stuck in writing... (8 Replies)
Discussion started by: exit86
8 Replies

5. UNIX for Dummies Questions & Answers

Help with editing string elements

Hi All I have a question. I would like to edit some string characters by replacing with characters of choice located in another file. For example in sample file>S5_SK1.chr01 NNNNNNNNNNNNNNNNNNNCAGCATGCAATAAGGTGACATAGATATACCCACACACCACACCCTAACACTAACCCTAATCTAACCCTGGCCAACCTGTTT... (9 Replies)
Discussion started by: pawannoel
9 Replies

6. Fedora

Help with controlling string elements

Hi All, I have a general difficulty in understanding how to control single elements within a string. An example, XYZ1234 ABCD5678 My expected output is : ABCD1234 XYZ5678 (swapping subset of string elements of choice) XYZ37 ACBD1214 (making calculations... (6 Replies)
Discussion started by: pawannoel
6 Replies

7. Shell Programming and Scripting

Array with String Elements

How can I get my array to understand the double-quotes I'm passing into it are to separate text strings and not part of an element? here's what I'm working with... db2 -v connect to foo db2 -x "select '\"' || stats_command || '\",' from db2law1.parallel_runstats where tabname = 'BAZ'" set... (4 Replies)
Discussion started by: djschmitt
4 Replies

8. Shell Programming and Scripting

Counting string of a variable

Hi, There is a variable f_name, it store some file names. Value of f_name=a.sql b.sql c.sql....... like this. want to count how many file name the var f_name stores. Without using loop is there any command to count that. (5 Replies)
Discussion started by: Dip
5 Replies

9. Shell Programming and Scripting

Search array elements as file for a matching string

I would like to find a list of files in a directory less than 2 days old and put them into an array variable. And then search for each file in the array for a matching string say "Return-code= 0". If it matches, then display the array element with a message as "OK". Your help will be greatly... (1 Reply)
Discussion started by: mkbaral
1 Replies

10. UNIX for Dummies Questions & Answers

Counting patterns in a shell string

Hello, I am writing a shell script and I need to find a way to count the number of whitespaces in a string. Eg: NAME="Bob Hope" I am looking for a way to count the number of whitespaces in this string. So a command that would take this string and return 1. Or take "First Middle Last"... (3 Replies)
Discussion started by: kevin80
3 Replies
Login or Register to Ask a Question