Sponsored Content
Top Forums Shell Programming and Scripting Word Occurrences script using awk Post 302923175 by ksmarine1980 on Thursday 30th of October 2014 11:38:58 PM
Old 10-31-2014
Word Occurrences script using awk

I'm putting together a script that will the count the occurrences of words in text documents. It works fine so far, but I'd like to make a couple tweaks/additions:

1) I'm having a hard time displaying the array index number, tried freq[$i] which just spit 0's back at me
2) Is there any way to eliminate the whitespace (spaces) from the word count?

I'm relatively new to Unix, so any help would be greatly appreciated. Thank you!
Code:
{
        $0 = tolower($0)
        for ( i = 1; i <= NF; i++ )
        freq[$i]++
}
BEGIN { printf "%-20s %-6s\n", "Word", "Count"}
END {
sort = "sort -k 2nr"
for (word in freq)
        printf "%-20s %-6s\n", word, freq[word] | sort
close(sort)
}

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count the number of occurrences of the word

I am a newbie in UNIX shell script and seeking help on this UNIX function. Please give me a hand. Thanks. I have a large file. Named as 'MyFile'. It was tab-delmited. I am told to write a shell function that counts the number of occurrences of the ord “mysring” in the file 'MyFile'. (1 Reply)
Discussion started by: duke0001
1 Replies

2. Shell Programming and Scripting

awk and gsub - how to replace only the first X occurrences

I have a text (text.txt) and I would like to replace only the first 2 occurrences of a word (but I might need to replace more): For example, if text is this: CAR sweet head hat red yellow CAR book brown tiger CAR cow CAR CAR milk I would like to replace the word "CAR" with word... (12 Replies)
Discussion started by: bingel
12 Replies

3. Homework & Coursework Questions

Du without directory and Grep for occurrences of a word

Assistance on work Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Files stored in ... (1 Reply)
Discussion started by: alindner
1 Replies

4. Shell Programming and Scripting

Count occurrences in awk

Hello, I have an output from GDB with many entries that looks like this 0x00007ffff7dece94 39 in dl-fini.c 0x00007ffff7dece97 39 in dl-fini.c 0x00007ffff7ab356c 50 in exit.c 0x00007ffff7aed9db in _IO_cleanup () at genops.c:1022 115 in dl-fini.c 0x00007ffff7decf7b in _dl_sort_fini (l=0x0,... (6 Replies)
Discussion started by: ikke008
6 Replies

5. Shell Programming and Scripting

Script to count word occurrences, but exclude some?

I am trying to count the occurrences of ALL words in a file. However, I want to exclude certain words: short words (i.e. <3 chars), and words contained in an blacklist file. There is also a desire to count words that are capitalized (e.g. proper names). I am not 100% sure where the line on... (5 Replies)
Discussion started by: Cronk
5 Replies

6. UNIX for Dummies Questions & Answers

BASH - Counting word occurrences in a Web Page

Hi all, I have to do a script bash (for university) that counts all word occurrences in a specific web page. anyone can help me?. Thanks :) (1 Reply)
Discussion started by: piacentero
1 Replies

7. UNIX for Dummies Questions & Answers

Awk: Counting occurrences between two files

Hi, I have two text files (1.txt and 2.txt). 2.txt contains two columns which are extracted from 1.txt using a simple if(condition) print. I want to: - count how many times the values contained in 2.txt appear in 1.txt -if they appear just one time, I have to delete the entire row in... (5 Replies)
Discussion started by: Pintug
5 Replies

8. Shell Programming and Scripting

awk Group By and count string occurrences

Hi Gurus, I'm scratching my head over and over and couldn't find the the right way to compose this AWK properly - PLEASE HELP :confused: Input: c,d,e,CLICK a,b,c,CLICK a,b,c,CONV c,d,e,CLICK a,b,c,CLICK a,b,c,CLICK a,b,c,CONV b,c,d,CLICK c,d,e,CLICK c,d,e,CLICK b,c,d,CONV... (6 Replies)
Discussion started by: Royi
6 Replies

9. UNIX for Advanced & Expert Users

Find 2 occurrences of a word and print file names

I was thinking something like this but it always gets rid of the file location. grep -roh base. | wc -l find . -type f -exec grep -o base {} \; | wc -l Would this be a job for awk? Would I need to store the file locations in an array? (3 Replies)
Discussion started by: cokedude
3 Replies

10. UNIX for Beginners Questions & Answers

awk or sed script to count number of occurrences and creating an average

Hi Friends , I am having one problem as stated file . Having an input CSV file as shown in the code U_TOP_LOGIC/U_HPB2/U_HBRIDGE2/i_core/i_paddr_reg_2_/Q,1,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,0,0... (4 Replies)
Discussion started by: kshitij
4 Replies
wc(1)							      General Commands Manual							     wc(1)

NAME
wc - count words, lines, and bytes or characters in a file SYNOPSIS
[file]... DESCRIPTION
The command counts lines, words, and bytes or characters in the named files, or in the standard input if no file names are specified. It also keeps a total count for all named files. A word is a string of characters delimited by spaces, tabs, or newlines. Options recognizes the following options: Report the number of bytes in each input file. Report the number of newline characters in each input file. Report the number of characters in each input file. Report the number of words in each input file. The and options are mutually exclusive. Otherwise, the and or options can be used in any combination to specify that a subset of lines, words, and bytes or characters are to be reported. When any option is specified, reports only the information requested. If no option is specified, the default output is When a file is specified on the command line, its name is printed along with the counts. Standard Output By default, the standard output contains an entry for each input file in the form: newlines words bytes file If the option is specified, the number of characters replaces the bytes field in this format. If any option is specified, the fields for the unspecified options are omitted. If no file operand is specified, neither the file name nor the preceding blank character is written. If more than one file operand is specified, an additional line is written at the end of the output, of the same format as the other lines, except that the word (in the POSIX locale) is written instead of a file name and the total of each column is written as appropriate. Under UNIX Standard environment, a word is a string of characters delimited by spaces, tabs, newline, carriage-return, vertical tab, or form-feed. RETURN VALUE
exits with one of the following values: Successful completion. An error occurred. EXTERNAL INFLUENCES
For information about the UNIX Standard environment, see standards(5). Environment Variables determines the range of graphics and space characters, and the interpretation of text as single- and/or multibyte characters. determines the language in which messages are displayed. If or is not specified in the environment or is null, they default to the value of If is not specified or is null, it defaults to (see lang(5)). If any internationalization variable contains an invalid setting, they all default to See environ(5). International Code Set Support Single- and multibyte character code sets are supported. with a newline character, the count will be off by one. WARNINGS
The command counts the number of newlines to determine the line count. If a text file has a final line that is not terminated with a new- line character, the count will be off by one. EXAMPLES
Print the number of words and characters in The following is printed when the above command is executed: where words is the number of words and chars is the number of characters in SEE ALSO
standards(5). STANDARDS CONFORMANCE
wc(1)
All times are GMT -4. The time now is 04:39 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy