How to count no of occurences of a character in a string in UNIX


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users How to count no of occurences of a character in a string in UNIX
# 8  
Old 03-17-2006
Quote:
Probably the simplest way is:
but your command (soln) uses three commands and 2 kernel DS
# 9  
Old 03-17-2006
Well yes it uses three very basica unix commands each with very low over heads and works no matter what the input is: Running all 4 command variations posted so far 1000 times in a row I get the following timings on an unused machine (I basically surrounded each command with a while loop).

echo $((`echo "a|b|c" | sed 's/[^|]//g' | wc -c` - 1 )) > /dev/null
real 0m10.109s
user 0m2.880s
sys 0m12.590s

echo $(($(echo "a|b|c"|sed 's/[a-z]//g'|wc -c)-1)) > /dev/null
real 0m10.141s
user 0m2.910s
sys 0m11.950s

echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
real 0m10.838s
user 0m3.340s
sys 0m8.630s

echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
real 0m6.962s
user 0m2.770s
sys 0m8.960s

So I suppose it depends on how you define simplest :-)
# 10  
Old 03-17-2006
Quote:
So I suppose it depends on how you define simplest :-)
of course yes, it depends upon how we define simplest.

I just ran the following two commands in a loop for 10,000 times,
could you please verify it.

Code:
# !/usr/bin/ksh
i=1
while [ $i -le 10000 ]
do
#with each of the following command run individually in the script
#echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
#echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
i=$(($i + 1))
done
exit 0

following is the time taken,
**********************************************
echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
real 6m23.95s
user 1m59.82s
sys 5m2.26s
**********************************************
echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
real 6m6.93s
user 1m23.27s
sys 3m18.99s
**********************************************

only the two commands have been considered for example.
when a particular sample is run for a longer time - output may differ.

Particularly, I dont find any use in just discussing the time taken by each of the commands when it is run,

it could be that the sed and awk are complex programs when compared to tr; hence naturally they consume more time to execute.
This is just my suggestion and the actual reason could be different.
# 11  
Old 03-17-2006
Quote:
Originally Posted by matrixmadhan
Particularly, I dont find any use in just discussing the time taken by each of the commands when it is run,

it could be that the sed and awk are complex programs when compared to tr; hence naturally they consume more time to execute.
This is just my suggestion and the actual reason could be different.
We are not *just* discusing the times taken, I added them to a thread in which a larger discussion was already taking place. My presentation of timings was in response to your apparent questioning of whether or not my solution was 'simple' or not. My personal opinion is that for trivial operations, if the simplest commands are used to perform the job, then they probably form the simplest solution. Hence my own particular solution. It does not invalidate any otehr solution, just presents a different one.

Other people often prefer to use the same tool, be it perl, awk, ruby etc to do just about everything. /shrug everyone has their own way.
# 12  
Old 03-17-2006
Actually, I think the timings are very interesting. Sometimes I want speed and I'm willing to tolerate a bit of complexity to achieve the speed. ksh can do this using just builtin functions. I have been fiddling with several techniques. There isn't a super obvious optimal choice. But I have settled on:
Code:
echo "a|b|c" | { read x
        x=${x}X
        n=0
        while ((${#x}>1)); do
                typeset -L1 c=$x ; typeset -R$((${#x}-1)) x
                [[ $c = \| ]] && ((n=n+1))
        done ; echo $n ; }
echo $n   # superfluous echo
exit 0

This won't work right in any shell except ksh, not even pdksh. The requirement is to read the string from a pipe. A fork() will happen to provide the echo process at the start of the pipeline. But with ksh, the last command in a pipeline is executed in the context of the parent shell if it is a builtin. This allows stuff like "echo foo | read bar" to work in (only) ksh. I was a little nervous that I could count all of the braced statements as "a builtin". That is the reason for the second "echo $n". The parent shell sees the new value for n, proving that the entire loop was processed inside ksh. Also remember that ksh will compile a loop and execute the compiled code. Lesser shells need to re-interpret the source code on each iteration of a loop.

I don't represent this as simple, only fast. But I didn't do any timings so I don't have any numbers.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count occurences of a character in a file by sorting results

Hello, I try to sort results of occurences in an array by using awk but I can't find the right command. that's why I'm asking your help ! :) Please see below the command that I run: awk '{ for ( i=1; i<=length; i++ ) arr++ }END{ for ( i in arr ) { print i, arr } }' dictionnary.txt ... (3 Replies)
Discussion started by: destin45
3 Replies

2. Shell Programming and Scripting

Character count in UNIX!

Hi, I am Beginner in writing shell scripting. I have tried to get the character count using wc command. But it is not giving the correct result. Could any one please tell me the reason? $ cat k.ksh Shell scripting The character count should be 15 but it is displaying as 16 when i use... (8 Replies)
Discussion started by: nikesh29
8 Replies

3. Shell Programming and Scripting

Awking string only 6 character long and providing a count

Morning Guys, I am attempting to awk a file which strings in the file is only 6 characters long and not more. Currently it is counting every line and giving a count of 59, but it should be 57 (not including the long baracode - 004705CIM*****) " awk '/./ {cnt++} END {print cnt}'... (11 Replies)
Discussion started by: Junes
11 Replies

4. UNIX for Advanced & Expert Users

couting occurences of a character inside a string and assigning it to a variable

echo "hello123" | tr -dc '' | wc -c using this command i can count the no of times a number from 0-9 occurs in the string "hello123" but how do i save this result inside a variable? if i do x= echo "hello123" | tr -dc '' | wc -c that does not work...plz suggest..thanks (3 Replies)
Discussion started by: arindamlive
3 Replies

5. Shell Programming and Scripting

Count occurences of a numeric string falling in a range

Dear all, I have numerous dat files (1.dat, 2.dat...) containing 500 numeric values each. I would like to count them, based on their range and obtain a histogram or a counter. INPUT: 1.dat 1.3 2.16 0.34 ...... 2.dat 1.54 0.94 3.13 ..... ... (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

6. Shell Programming and Scripting

Count occurences of string

Hi, Please help me in finding the number of occurences of the string. Example: Apple, green, blue, Apple, Orange, green, blue are the strings can be even in the next line. The o/p should look as: Word Count ----- ----- Apple 2 green 2 Orange 1 blue 2 Thanks (2 Replies)
Discussion started by: acc888
2 Replies

7. Shell Programming and Scripting

awk: sort lines by count of a character or string in a line

I want to sort lines by how many times a string occurs in each line (the most times first). I know how to do this in two passes (add a count field in the first pass then sort on it in the second pass). However, can it be done more optimally with a single AWK command? My AWK has improved... (11 Replies)
Discussion started by: Michael Stora
11 Replies

8. Shell Programming and Scripting

Count number of occurences of a character in a field defined by the character in another field

Hello, I have a text file with n lines in the following format (9 column fields): Example: contig00012 149606 G C 49 68 60 18 c$cccccacccccccccc^c I need to count the number of lower-case and upper-case occurences in column 9, respectively, of the... (3 Replies)
Discussion started by: s052866
3 Replies

9. Shell Programming and Scripting

delete last character in all occurences of string

Hello all, I have a file containing the following p1 q1 p2 q2 p1 p2 p3 pr1 pr2 pr1 pr2 pa1 pa2 I want to remove the last character from all strings that start with 'p' and end with '1'. In general, I do not know what is between the first part of the string and the last part of the string.... (4 Replies)
Discussion started by: bigfoot
4 Replies

10. HP-UX

count occurences of specific character in the file

For counting the occurences of specific character in the file I am issuing the command grep -o 'character' filename | wc -w It works in other shells but not in HP-UX as there is no option -o for grep. What do I do now? (9 Replies)
Discussion started by: superprogrammer
9 Replies
Login or Register to Ask a Question