Counting Word Appearance


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Counting Word Appearance
# 1  
Old 08-10-2015
Counting Word Appearance

How do you write a script that counts the number of times a word appears in a file and output it?

Original:
Code:
ID1     SMARCB1;Adil;Jon
ID2     Jon;Annie;Mei
ID3     Adil;Spaghetti;NBA
ID4     Raptors;wethenorth;SMARCB1
ID5     SMARCB1;wethenorth

Objective:
Code:
SMARCB1: 3
Adil: 2
Jon: 2
wethenorth: 2
Annie: 1
Mei: 1
Spaghetti: 1
NBA: 1
Raptors: 1


Last edited by vgersh99; 08-10-2015 at 12:41 PM.. Reason: code tags, please!
# 2  
Old 08-10-2015
One approach might be to...
1) get rid of the ID markers on each line
perhaps a cut or awk command
2) get each name on its own line
perhaps translate the ; to a carriage return
3) sort and count
# 3  
Old 08-10-2015
Quote:
Originally Posted by Lipidil
How do you write a script that counts the number of times a word appears in a file and output it?
The first thing you need to clarify is: what is a "word"? This is not as obvious as it looks: "SMARCB1" in your example is seemingly one, but is "SMARCB1-Jon" also one word or is it two words, connected by a dash?

Usually it is a matter of some characters continuing a word and others ending one. In your example obviously all characters (small and caps) as well as digits continue a word, the semicolon (and blanks, probably) ends one.

Once you have this defined clearly write a filter which inserts line breaks at every "non-continuing character" and sort, then simply count.

You might wonder why - instead of providing some command line ready to use we are a bit vague about what to do: first, this is the beginners forum. We do not want to spoil your joy of learning the trade yourself. Therefore we give pointers to guide but won't spoon-feed you solutions. Second, what you present looks suspiciously like homework. If it is: there is a special forum for this and your thread can get transferred there if this is the case. Ask any moderator and we will gladly assist you. Still, there are special rules in place in this forum and you will have to hand in the necessary questionnaire subsequently.

I hope this helps.

bakunin
# 4  
Old 08-10-2015
try:

Code:
awk -F '[ ;]' '{for (i=2; i,=NF; i++) {arr[$(i)]++}}  
                   END{for (i in arr){print i ":", arr[i]}}'  infile > outfile

to start with.
# 5  
Old 08-10-2015
<= instead of ,= in for loop control
# 6  
Old 08-10-2015
Got Perl?
Alphabetically sorted.
Code:
perl -nlaF'[;\s]+' -e '
    map{$words{$_}++;} @F[1..$#F];
    END{for(sort keys %words){print "$_: $words{$_}"}}
' lipidil.file

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Word-counting and substitution with awk

Hi!! I am trying to write a program which allows me to count how many times I used the same word in a text: {$0 = tolower ($0) gsub (/_]/, "", $0) for (i = 1; i <= NF; i++) freq++ } END { for (word in freq) printf "%s\t%d\n", word, freq It seems work but... (3 Replies)
Discussion started by: ettore8888
3 Replies

2. Shell Programming and Scripting

Changing the appearance of a file

Hello, I have a file. The content of the file is shown below. >AB34.txt 65 66 67 68 >2D2Z.txt 61 62 >2D2Z.txt 92 93 94 >3E50.txt 106 107 >3E50.txt (1 Reply)
Discussion started by: manual
1 Replies

3. Shell Programming and Scripting

Word counting perl script

Hi friends i need a help on Perl Script In My Home directory, i have some other directories and inside those directories i have some subdirectories and all the directories contains files. Now i want to count a word in all files and i want the output like below wordcount in which file(name... (5 Replies)
Discussion started by: siva kumar
5 Replies

4. UNIX for Dummies Questions & Answers

BASH - Counting word occurrences in a Web Page

Hi all, I have to do a script bash (for university) that counts all word occurrences in a specific web page. anyone can help me?. Thanks :) (1 Reply)
Discussion started by: piacentero
1 Replies

5. Programming

Bug in "Word Counting" Program

I have written a simple program that counts the number of words in the input stream. There is a small bug in the code and i am not able to figure out the cause of this bug. #include <stdio.h> int main() { int ichar = 0; int in_word = 1; // in_word = 1 *outside a word* in_word = 0... (4 Replies)
Discussion started by: sreeharshasn
4 Replies

6. Shell Programming and Scripting

Script to counting a specific word in a logfile on each day of this month, last month etc

Hello All, I am trying to come up with a shell script to count a specific word in a logfile on each day of this month, last month and the month before. I need to produce this report and email it to customer. Any ideas would be appreciated! (5 Replies)
Discussion started by: pnara2
5 Replies

7. Shell Programming and Scripting

counting word xx referred to a time period, like minute or hour

Hello, I try to insert a post because I've got a trouble to perform a unix job. But I didn't found which steps (procedure) I should follow. Could you help me? I got a log by my Application box, like following: gbosmam037:test >view Log_Server.csv ... (2 Replies)
Discussion started by: maluca68
2 Replies

8. Homework & Coursework Questions

Counting a particular word per line

1. The problem statement, all variables and given/known data: It was the best of times, it was the worst of times, It was the age of wisdom, it was the age of foolishness, It was the epoch of belief, it was the epoch of incredulity, It was the season of Light, it was the season of Darkness, It... (3 Replies)
Discussion started by: bigubosu
3 Replies

9. UNIX for Dummies Questions & Answers

counting the occurence of a word

In a file I have to count a particular word. like i need apache how many times. I tried this $ tr "\011" "\012\012"<foo1 | tr -cd "" |sort\uniq -c but I got result like this 32 apache 18 dns 12 doctor Please sugest me (4 Replies)
Discussion started by: pranabrana
4 Replies

10. Linux

option of grep for counting exact word ??

Hi All, I have a quary regarding grep command in linux. I have a file which contains 56677 56677 +56677 +56677 56677 56677 56677 I want to extract total count of "56677" When I hit the following command #cat filename | grep -w -c '56677' the result comes 7. Its counting... (3 Replies)
Discussion started by: maddy
3 Replies
Login or Register to Ask a Question