How to find the number of occurence of particular word from a text file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to find the number of occurence of particular word from a text file?
# 15  
Old 05-12-2014
Give it a try with:-
Code:
tr " " "\n" < your_log_file_name | grep -i "^another$"

Does this get you what you need?

I suppose you might have to consider another., another,, another!, etc. too.

If this is a worry, try:-
Code:
tr "[:punct:]" " " < your_log_file_name | grep -i "^another$"

The plan with this one is to translate all punctuation to spaces, then translate all spaces to a new-line, then use grep to count the records (one word each by now) that contain the string from the first to the last character only.


It might be worth testing it out a small section first and think of as many variations as you can think of.



Robin
# 16  
Old 05-12-2014
Quote:
Originally Posted by protocomm
Code:
awk '{n=split ($0,tab)}{for(i=0;i<=n;i++){if(tab[i]=="am") count++}};END{print count}' file

Your use of split() is redundant. AWK already split the line. The value that split() returns is the current value of NF. You can iterate through the fields using i<=NF and checking each $i.

Also, if there aren't any matches, count will be undefined and the END print statement will output an empty line. I would change its argument to count+0.

Quote:
Originally Posted by RudiC
With awk, try also
Code:
awk '{n+=gsub(/am/,"&")}END{print n}' file

Beware of matching substrings triggering false positives.


Quote:
Originally Posted by rbatte1
Give it a try with:-
Code:
tr " " "\n" < your_log_file_name | grep -i "^another$"

...<snip>...

Code:
tr "[:punct:]" " " < your_log_file_name | grep -i "^another$"

The plan with this one is to translate all punctuation to spaces, then translate all spaces to a new-line, then use grep to count the records (one word each by now) that contain the string from the first to the last character only.
That's seems to me to be a reasonable approach, but neither of those pipelines actually implements it. As described, the approach would require a pipeline with two tr's. However, it can be done with one if you convert punctuation directly to newlines, which would be equivalent. In that case, you can modify your latter suggestion to:
Code:
Code:
tr '[:punct:] ' '[\n*]' < your_log_file_name | grep -i "^another$"

Note the space after the punctuation character class. If one wanted to include any blank characters, the :blank: class could have been used instead.

Often times it's easier and safer to define what to include than what to exclude. Based on your approach, if we define a word as a sequence of [:alpha:] characters, the following portable solution can be used:
Code:
tr -sc '[:alpha:]' '[\n*]' | grep -Fixc word

Regards,
Alister
# 17  
Old 05-12-2014
Thanks for pointing out my logical error. Smilie

Perhaps I should have gone with:-
Code:
tr "[:punct:]" " " < your_log_file_name | tr "[:blank:]" "\n" | grep -i "^another$"

.
.
I am a little confused by your suggestion to use the :alpha: class. Would this not act on the characters we want to preserve?

I got some odd output from a quick test:-
Code:
# echo "Hello world!" | tr "[:alpha:]" "\n*"
***** *****!
# echo "Hello world!" | tr "[:alpha:]" "\n"
echo "Hello world!" | tr "[:alpha:]" "\n" 





 




!

Am I missing something?

I could bunch it up into a single tr too. A quick test shows this:-
Code:
# echo "This is what I am, I am not spam I hope." | tr "[:punct:][:blank:]" "\n"|grep -c "^am$"
2

That would consolidate my suggestion to:-
Code:
tr "[:punct:][:blank:]" "\n" < your_log_file_name | grep -i "^another$"


Robin

Last edited by rbatte1; 05-12-2014 at 12:22 PM..
# 18  
Old 05-12-2014
Quote:
Originally Posted by rbatte1
I am a little confused by your suggestion to use the :alpha: class. Would this not act on the characters we want to preserve?

I got some odd output from a quick test:-
Code:
# echo "Hello world!" | tr "[:alpha:]" "\n*"
***** *****!
# echo "Hello world!" | tr "[:alpha:]" "\n"
echo "Hello world!" | tr "[:alpha:]" "\n" 





 




!

Am I missing something?
Two things. First, the -c option, which complements [:alpha:], so what is matched is everything that is not a member of [:alpha:].

Second, \n* must be bracketed. With the brackets, it represents as many newlines as it takes to match the length of the class in the previous argument. Without the brackets, it means a single newline followed by as many asterisks as it takes. So, your erroneous version would replace the first character in [:alpha:] with a newline and every subsequent character with an asterisk.

In my current locale, A is the first member of [:alpha:]. Note how in the first example 'A' is converted to a newline while 'a' becomes an asterisk:
Code:
$ printf aAa | tr '[:alpha:]' '\n*' | od -c
0000000   *  \n   *
0000003
$ printf aAa | tr '[:alpha:]' '[\n*]' | od -c
0000000  \n  \n  \n
0000003

With modern tr implementations, you can probably get away with simply using \n:
Code:
$ printf aAa | tr '[:alpha:]' '\n' | od -c
0000000  \n  \n  \n
0000003

... but there is the possibility of a portability issue. From POSIX tr:
Quote:
When string2 is shorter than string1, a difference results between historical System V and BSD systems. A BSD system pads string2 with the last character found in string2. Thus, it is possible to do the following:

tr 0123456789 d

which would translate all digits to the letter 'd'. Since this area is specifically unspecified in this volume of POSIX.1-2008, both the BSD and System V behaviors are allowed, but a conforming application cannot rely on the BSD behavior. It would have to code the example in the following way:

tr 0123456789 '[d*]'
Regards,
Alister

Last edited by alister; 05-12-2014 at 02:27 PM..
This User Gave Thanks to alister For This Post:
# 19  
Old 05-13-2014
Hai alister,
can u please say me the exact command that i can use to find the particular word 'another' from 17000 lines from a log file or text file
# 20  
Old 05-13-2014
Quote:
Originally Posted by sheela
Hai alister,
can u please say me the exact command that i can use to find the particular word 'another' from 17000 lines from a log file or text file
Again, Does it really take hard to try at least one solution yourself?
If you can not access your system now, please come back to us whenever you try.
This User Gave Thanks to clx For This Post:
# 21  
Old 05-14-2014
Thanks for ur adviceSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find a word and increment the number in the word & save into new files

Hi All, I am looking for a perl/awk/sed command to auto-increment the numbers line in file, P1.tcl: run_build_model sparc_ifu_dec run_drc set_faults -model path_delay -atpg_effectiveness -fault_coverage add_delay_paths P1 set_atpg -abort_limit 1000 run_atpg -ndetects 1000 I would like... (6 Replies)
Discussion started by: jypark22
6 Replies

2. Shell Programming and Scripting

Find number of digits in a word

HI, Can you tell me how to find the number of digits in a word. $cat data.txt +123456ad 87645768 Output should be 6 8 (5 Replies)
Discussion started by: ashwin3086
5 Replies

3. Shell Programming and Scripting

Find the occurence of particular string in log file

I have a log file which looks like this: <845185415165:STATUS:5/0:0:0:0:0|ghy59DI5zasldf87asdfamas8df9asd903tGUVSQx4GJVSQ==> I have to extract DATE and number of times the keyword STATUS is shown on each date. Input is : <1354625655744:STATUS:5/0:0:0:0:0|ghy59DI5ztGUVSQx4GJVSQ==>... (8 Replies)
Discussion started by: maddyrox
8 Replies

4. UNIX for Dummies Questions & Answers

Search specific pattern in file and return number of occurence

Hi I want to search for a specific pattern in file Say ABC;HELLO_UNIX_WORLD;PQR ABC;HELLO_UNIX_WORLD_IS_NOT_ENOUGH;XYZ ABC;HELLO_UNIX_FORUM;LMN Pattern to search is : "HELLO_UNIX_*****" and not "HELLO_UNIX_***_***_" I mean after "HELLO_UNIX" there can only be one word.In this case... (2 Replies)
Discussion started by: dashing201
2 Replies

5. Shell Programming and Scripting

How to find and print the last word of each line from a text file

Can any one help us in finding the the last word of each line from a text file and print it. eg: 1st --> aaa bbbb cccc dddd eeee ffff ee 2nd --> aab ered er fdf ere ww ww f the o/p should be a below. ee f (1 Reply)
Discussion started by: naveen_sangam
1 Replies

6. Shell Programming and Scripting

finding the number of occurence of a word in a line

suppose i have this line abs|der|gt|dftnrk|dtre i want to count the number of "|" in this line.. how can i do that. plz help:confused: (9 Replies)
Discussion started by: priyanka3006
9 Replies

7. Shell Programming and Scripting

How to find the number of column in the text file...?

Hi, i have text file with ~ seperated columns. it is very huge size of file, in the file sompulsary supposed to has 20 columns with ~ seperated. so how can i find if the file has 20 column in the all rows...? Sample file: APA+VU~10~~~~~03~101~101~~~APA.N O 20081017 120.00... (1 Reply)
Discussion started by: psiva_arul
1 Replies

8. Shell Programming and Scripting

TO find the word which occurs maximum number of times

Hi Folks !!!!!!!!!!!!!!!!!!! My Requirement is............. i have a input file: 501,501.chan 502,502.anand 503,503.biji 504,504.raja 505,505.chan 506,506.anand 507,507.chan and my o/p should be chan->3 i.e. the word which occurs maximum number of times in a file should be... (5 Replies)
Discussion started by: aajan
5 Replies

9. Shell Programming and Scripting

Count the number of occurence of perticular word from file

I want to count the number of occurence of perticular word from one text file. Please tell me "less" command is work in ksh or not. If it is not working then instead of that which command will work. :confused: (40 Replies)
Discussion started by: rinku
40 Replies

10. Shell Programming and Scripting

Can a shell script pull the first word (or nth word) off each line of a text file?

Greetings. I am struggling with a shell script to make my life simpler, with a number of practical ways in which it could be used. I want to take a standard text file, and pull the 'n'th word from each line such as the first word from a text file. I'm struggling to see how each line can be... (5 Replies)
Discussion started by: tricky
5 Replies
Login or Register to Ask a Question