How to count no of occurences of a character in a string in UNIX

03-17-2006

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

Quote:

Probably the simplest way is:

but your command (soln) uses three commands and 2 kernel DS

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

03-17-2006

Registered User

183, 2

Join Date: Jul 2005

Last Activity: 14 September 2007, 11:54 AM EDT

Location: England

Posts: 183

Thanks Given: 0

Thanked 2 Times in 2 Posts

Well yes it uses three very basica unix commands each with very low over heads and works no matter what the input is: Running all 4 command variations posted so far 1000 times in a row I get the following timings on an unused machine (I basically surrounded each command with a while loop).

echo $((`echo "a|b|c" | sed 's/[^|]//g' | wc -c` - 1 )) > /dev/null
real 0m10.109s
user 0m2.880s
sys 0m12.590s

echo $(($(echo "a|b|c"|sed 's/[a-z]//g'|wc -c)-1)) > /dev/null
real 0m10.141s
user 0m2.910s
sys 0m11.950s

echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
real 0m10.838s
user 0m3.340s
sys 0m8.630s

echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
real 0m6.962s
user 0m2.770s
sys 0m8.960s

So I suppose it depends on how you define simplest :-)

Unbeliever

View Public Profile for Unbeliever

Find all posts by Unbeliever

03-17-2006

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

Quote:

So I suppose it depends on how you define simplest :-)

of course yes, it depends upon how we define simplest.

I just ran the following two commands in a loop for 10,000 times,
could you please verify it.

Code:

# !/usr/bin/ksh
i=1
while [ $i -le 10000 ]
do
#with each of the following command run individually in the script
#echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
#echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
i=$(($i + 1))
done
exit 0

following is the time taken,
**********************************************
echo 'a|b|c' | tr -dc '|' | wc -c > /dev/null
real 6m23.95s
user 1m59.82s
sys 5m2.26s
**********************************************
echo $(($(echo 'a|b|c' |awk -F"|" '{print NF}') -1)) > /dev/null
real 6m6.93s
user 1m23.27s
sys 3m18.99s
**********************************************

only the two commands have been considered for example.
when a particular sample is run for a longer time - output may differ.

Particularly, I dont find any use in just discussing the time taken by each of the commands when it is run,

it could be that the sed and awk are complex programs when compared to tr; hence naturally they consume more time to execute.
This is just my suggestion and the actual reason could be different.

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

03-17-2006

Registered User

183, 2

Join Date: Jul 2005

Last Activity: 14 September 2007, 11:54 AM EDT

Location: England

Posts: 183

Thanks Given: 0

Thanked 2 Times in 2 Posts

Quote:

Originally Posted by matrixmadhan

Particularly, I dont find any use in just discussing the time taken by each of the commands when it is run,

it could be that the sed and awk are complex programs when compared to tr; hence naturally they consume more time to execute.
This is just my suggestion and the actual reason could be different.

We are not *just* discusing the times taken, I added them to a thread in which a larger discussion was already taking place. My presentation of timings was in response to your apparent questioning of whether or not my solution was 'simple' or not. My personal opinion is that for trivial operations, if the simplest commands are used to perform the job, then they probably form the simplest solution. Hence my own particular solution. It does not invalidate any otehr solution, just presents a different one.

Other people often prefer to use the same tool, be it perl, awk, ruby etc to do just about everything. /shrug everyone has their own way.

Unbeliever

View Public Profile for Unbeliever

Find all posts by Unbeliever

03-17-2006

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

Actually, I think the timings are very interesting. Sometimes I want speed and I'm willing to tolerate a bit of complexity to achieve the speed. ksh can do this using just builtin functions. I have been fiddling with several techniques. There isn't a super obvious optimal choice. But I have settled on:

Code:

echo "a|b|c" | { read x
        x=${x}X
        n=0
        while ((${#x}>1)); do
                typeset -L1 c=$x ; typeset -R$((${#x}-1)) x
                [[ $c = \| ]] && ((n=n+1))
        done ; echo $n ; }
echo $n   # superfluous echo
exit 0

This won't work right in any shell except ksh, not even pdksh. The requirement is to read the string from a pipe. A fork() will happen to provide the echo process at the start of the pipeline. But with ksh, the last command in a pipeline is executed in the context of the parent shell if it is a builtin. This allows stuff like "echo foo | read bar" to work in (only) ksh. I was a little nervous that I could count all of the braced statements as "a builtin". That is the reason for the second "echo $n". The parent shell sees the new value for n, proving that the entire loop was processed inside ksh. Also remember that ksh will compile a loop and execute the compiled code. Lesser shells need to re-interpret the source code on each iteration of a loop.

I don't represent this as simple, only fast. But I didn't do any timings so I don't have any numbers.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

UNIX for Advanced & Expert Users

How to count no of occurences of a character in a string in UNIX

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count occurences of a character in a file by sorting results

Discussion started by: destin45

2. Shell Programming and Scripting

Character count in UNIX!

Discussion started by: nikesh29

3. Shell Programming and Scripting

Awking string only 6 character long and providing a count

Discussion started by: Junes

4. UNIX for Advanced & Expert Users

couting occurences of a character inside a string and assigning it to a variable

Discussion started by: arindamlive

5. Shell Programming and Scripting

Count occurences of a numeric string falling in a range

Discussion started by: chen.xiao.po

6. Shell Programming and Scripting

Count occurences of string

Discussion started by: acc888

7. Shell Programming and Scripting

awk: sort lines by count of a character or string in a line

Discussion started by: Michael Stora

8. Shell Programming and Scripting

Count number of occurences of a character in a field defined by the character in another field

Discussion started by: s052866

9. Shell Programming and Scripting

delete last character in all occurences of string

Discussion started by: bigfoot

10. HP-UX

count occurences of specific character in the file

Discussion started by: superprogrammer