Counting occurrences of all words in multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Counting occurrences of all words in multiple files
# 1  
Old 12-23-2011
Counting occurrences of all words in multiple files

Hey Unix gurus,

I would like to count the number occurrences of all the words (regardless of case) across multiple files, preferably outputting them in descending order of occurrence. This is well beyond my paltry shell scripting ability.

Researching, I can find many scripts/commands that allow me to find occurrences of one word (basic grep stuff).

Any help would be appreciated.
# 2  
Old 12-23-2011
Try:
Code:
awk '{for (i=1;i<=NF;i++) a[tolower($i)]++}END{for (i in a) print i" "a[i]}' * | sort -nrk2

# 3  
Old 12-23-2011
Similar idea:
Convert the file contents to lower case. Change any space or tab to a newline. Remove any blank lines. Sort each word to alphabetic order. Count unique occurances. Sort by descending order of count.

Code:
cat *.txt|tr '[A-Z]' '[a-z]' | tr ' \t' '\n\n'|sed -e "/^$/d"| \
          sort|uniq -c|sort -nr

# 4  
Old 12-23-2011
Many solutions (depending of your definition of a word)...

A possible solution :
Code:
tr -cs '[:alnum:]' '[\n*]' < inputfile | sort | uniq -c | sort -nr

Jean-Pierre.
# 5  
Old 12-23-2011
Thank you all that replied.

I tried bartus's solution and it worked quite nice. Awk programming is definitely over my head.

I shall try the others as time permits.

Many thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed parser behaving strange on replacing multiple words in multiple files

I have 4000 files like $cat clus_grp_seq10_g.phy 18 1002 anig_OJJ65951_1 ATGGTTTCGCAGCGTGATAGAGAATTGTTTAGGGATGATATTCGCTCGCGAGGAACGAAGCTCAATGCTGCCGAGCGCGAGAGTCTGCTAAGGCCATATCTGCCAGATCCGTCTGACCTTCCACGCAGGCCACTTCAGCGGCGCAAGAAGGTTCCTCG aver_OOF92921_1 ... (1 Reply)
Discussion started by: sammy777888
1 Replies

2. UNIX for Dummies Questions & Answers

Awk: Counting occurrences between two files

Hi, I have two text files (1.txt and 2.txt). 2.txt contains two columns which are extracted from 1.txt using a simple if(condition) print. I want to: - count how many times the values contained in 2.txt appear in 1.txt -if they appear just one time, I have to delete the entire row in... (5 Replies)
Discussion started by: Pintug
5 Replies

3. UNIX for Dummies Questions & Answers

BASH - Counting word occurrences in a Web Page

Hi all, I have to do a script bash (for university) that counts all word occurrences in a specific web page. anyone can help me?. Thanks :) (1 Reply)
Discussion started by: piacentero
1 Replies

4. Shell Programming and Scripting

Counting non-specific occurrences within a file.

I'm pretty new to scripting and didn't see an example of this issue yet. I am trying to count and print the total number of times each value is found within a file. Here is a short example of my starting file. value 3 value 3 value 3 value 3 value 4 value 6 value 6 value 6 value 6... (3 Replies)
Discussion started by: funkynmr
3 Replies

5. Shell Programming and Scripting

Counting number of files that contain words stored in another file

Hi All, I have written a script on this but it does not do the requisite job. My requirement is this: 1. I have two kinds of files each with different extensions. One set of files are *.dat (6000 unique DAT files all in one directory) and another set *.dic files (6000 unique DIC files in... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

6. Shell Programming and Scripting

counting number of pattern occurrences

Hi All, Is it possible to count number of occurrences of a pattern in a single record using awk?? for example: a line like this: abrsjdfhafa I want to count the number of a character occurrences. but still use the default RS, I don't want to set RS to single character. (1 Reply)
Discussion started by: ghoda2_10
1 Replies

7. Shell Programming and Scripting

multiple files: counting

In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column. OM 3328 O BT 268 5.800 7.500 4.700 0.000 1.400 OM 3329 O BT 723 8.500 8.900... (7 Replies)
Discussion started by: asanjuan
7 Replies

8. Shell Programming and Scripting

Counting lines in multiple files

Hi, I have couple of .txt files (say 50 files) in a folder. For each file: I need to get the number of lines in each file and then that count -1 (I wanted to exclude the header. Then sum the counts of all files and output the total sum. Is there an efficient way to do this using shell... (7 Replies)
Discussion started by: Lucky Ali
7 Replies

9. Shell Programming and Scripting

Counting words

Hi Is there a way to count the no. of words in all files in directory. All are text files.I use wc -w but somehow i am not getting the rite answer. Is there an alternative. Thanks in advance (9 Replies)
Discussion started by: kinny
9 Replies

10. Shell Programming and Scripting

pattern search for multiple log files and counting

I have 10 appservers and each appserver has 4 jvms . Each of these logs is archived and stored on a nfs directory . For example the files are /logs/200907/ap1-jvm1.server.log.20090715.gz /logs/200907/ap2-jvm2.server.log.20090714.gz /logs/200908/ap1-jvm1.server.log.20090812.gz I want to... (3 Replies)
Discussion started by: gubbu
3 Replies
Login or Register to Ask a Question