How to find the number of occurence of particular word from a text file?

05-12-2014

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

Give it a try with:-

Code:

tr " " "\n" < your_log_file_name | grep -i "^another$"

Does this get you what you need?

I suppose you might have to consider another., another,, another!, etc. too.

If this is a worry, try:-

Code:

tr "[:punct:]" " " < your_log_file_name | grep -i "^another$"

The plan with this one is to translate all punctuation to spaces, then translate all spaces to a new-line, then use grep to count the records (one word each by now) that contain the string from the first to the last character only.

It might be worth testing it out a small section first and think of as many variations as you can think of.

Robin

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

05-12-2014

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by protocomm

Code:

awk '{n=split ($0,tab)}{for(i=0;i<=n;i++){if(tab[i]=="am") count++}};END{print count}' file

Your use of split() is redundant. AWK already split the line. The value that split() returns is the current value of NF. You can iterate through the fields using i<=NF and checking each $i.

Also, if there aren't any matches, count will be undefined and the END print statement will output an empty line. I would change its argument to count+0.

Quote:

Originally Posted by RudiC

With awk, try also

Code:

awk '{n+=gsub(/am/,"&")}END{print n}' file

Beware of matching substrings triggering false positives.

Quote:

Originally Posted by rbatte1

Give it a try with:-

Code:

tr " " "\n" < your_log_file_name | grep -i "^another$"

...<snip>...

Code:

tr "[:punct:]" " " < your_log_file_name | grep -i "^another$"

The plan with this one is to translate all punctuation to spaces, then translate all spaces to a new-line, then use grep to count the records (one word each by now) that contain the string from the first to the last character only.

That's seems to me to be a reasonable approach, but neither of those pipelines actually implements it. As described, the approach would require a pipeline with two tr's. However, it can be done with one if you convert punctuation directly to newlines, which would be equivalent. In that case, you can modify your latter suggestion to:

Code:


Code:

tr '[:punct:] ' '[\n*]' < your_log_file_name | grep -i "^another$"

Note the space after the punctuation character class. If one wanted to include any blank characters, the :blank: class could have been used instead.

Often times it's easier and safer to define what to include than what to exclude. Based on your approach, if we define a word as a sequence of [:alpha:] characters, the following portable solution can be used:

Code:

tr -sc '[:alpha:]' '[\n*]' | grep -Fixc word

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

05-12-2014

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

Thanks for pointing out my logical error.

Perhaps I should have gone with:-

Code:

tr "[:punct:]" " " < your_log_file_name | tr "[:blank:]" "\n" | grep -i "^another$"

.
.
I am a little confused by your suggestion to use the :alpha: class. Would this not act on the characters we want to preserve?

I got some odd output from a quick test:-

Code:

# echo "Hello world!" | tr "[:alpha:]" "\n*"
***** *****!
# echo "Hello world!" | tr "[:alpha:]" "\n"
echo "Hello world!" | tr "[:alpha:]" "\n" 





 




!

Am I missing something?

I could bunch it up into a single tr too. A quick test shows this:-

Code:

# echo "This is what I am, I am not spam I hope." | tr "[:punct:][:blank:]" "\n"|grep -c "^am$"
2

That would consolidate my suggestion to:-

Code:

tr "[:punct:][:blank:]" "\n" < your_log_file_name | grep -i "^another$"

Robin

Last edited by rbatte1; 05-12-2014 at 12:22 PM..

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

05-12-2014

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by rbatte1

I am a little confused by your suggestion to use the :alpha: class. Would this not act on the characters we want to preserve?

I got some odd output from a quick test:-

Code:

# echo "Hello world!" | tr "[:alpha:]" "\n*"
***** *****!
# echo "Hello world!" | tr "[:alpha:]" "\n"
echo "Hello world!" | tr "[:alpha:]" "\n" 





 




!

Am I missing something?

Two things. First, the -c option, which complements [:alpha:], so what is matched is everything that is not a member of [:alpha:].

Second, \n* must be bracketed. With the brackets, it represents as many newlines as it takes to match the length of the class in the previous argument. Without the brackets, it means a single newline followed by as many asterisks as it takes. So, your erroneous version would replace the first character in [:alpha:] with a newline and every subsequent character with an asterisk.

In my current locale, A is the first member of [:alpha:]. Note how in the first example 'A' is converted to a newline while 'a' becomes an asterisk:

Code:

$ printf aAa | tr '[:alpha:]' '\n*' | od -c
0000000   *  \n   *
0000003
$ printf aAa | tr '[:alpha:]' '[\n*]' | od -c
0000000  \n  \n  \n
0000003

With modern tr implementations, you can probably get away with simply using \n:

Code:

$ printf aAa | tr '[:alpha:]' '\n' | od -c
0000000  \n  \n  \n
0000003

... but there is the possibility of a portability issue. From POSIX tr:

Quote:

When string2 is shorter than string1, a difference results between historical System V and BSD systems. A BSD system pads string2 with the last character found in string2. Thus, it is possible to do the following:

tr 0123456789 d

which would translate all digits to the letter 'd'. Since this area is specifically unspecified in this volume of POSIX.1-2008, both the BSD and System V behaviors are allowed, but a conforming application cannot rely on the BSD behavior. It would have to code the example in the following way:

tr 0123456789 '[d*]'

Regards,
Alister

Last edited by alister; 05-12-2014 at 02:27 PM..

This User Gave Thanks to alister For This Post:

alister

View Public Profile for alister

Find all posts by alister

05-13-2014

Registered User

16, 0

Join Date: May 2014

Last Activity: 26 May 2014, 1:25 AM EDT

Posts: 16

Thanks Given: 3

Thanked 0 Times in 0 Posts

Hai alister,
can u please say me the exact command that i can use to find the particular word 'another' from 17000 lines from a log file or text file

sheela

View Public Profile for sheela

Find all posts by sheela

05-13-2014

Registered User

1,690, 205

Join Date: Jun 2007

Last Activity: 13 July 2020, 5:35 PM EDT

Location: Mumbai, India

Posts: 1,690

Thanks Given: 139

Thanked 205 Times in 199 Posts

Quote:

Originally Posted by sheela

Hai alister,
can u please say me the exact command that i can use to find the particular word 'another' from 17000 lines from a log file or text file

Again, Does it really take hard to try at least one solution yourself?
If you can not access your system now, please come back to us whenever you try.

This User Gave Thanks to clx For This Post:

clx

View Public Profile for clx

Find all posts by clx

05-14-2014

Registered User

16, 0

Join Date: May 2014

Last Activity: 26 May 2014, 1:25 AM EDT

Posts: 16

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thanks for ur advice

sheela

View Public Profile for sheela

Find all posts by sheela

Shell Programming and Scripting

How to find the number of occurence of particular word from a text file?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find a word and increment the number in the word & save into new files

Discussion started by: jypark22

2. Shell Programming and Scripting

Find number of digits in a word

Discussion started by: ashwin3086

3. Shell Programming and Scripting

Find the occurence of particular string in log file

Discussion started by: maddyrox

4. UNIX for Dummies Questions & Answers

Search specific pattern in file and return number of occurence

Discussion started by: dashing201

5. Shell Programming and Scripting

How to find and print the last word of each line from a text file

Discussion started by: naveen_sangam

6. Shell Programming and Scripting

finding the number of occurence of a word in a line

Discussion started by: priyanka3006

7. Shell Programming and Scripting

How to find the number of column in the text file...?

Discussion started by: psiva_arul

8. Shell Programming and Scripting

TO find the word which occurs maximum number of times

Discussion started by: aajan

9. Shell Programming and Scripting

Count the number of occurence of perticular word from file

Discussion started by: rinku

10. Shell Programming and Scripting

Can a shell script pull the first word (or nth word) off each line of a text file?

Discussion started by: tricky