Sponsored Content
Top Forums Shell Programming and Scripting word frequency counter - awk solution? Post 302501668 by irrevocabile on Friday 4th of March 2011 07:52:22 AM
Old 03-04-2011
word frequency counter - awk solution?

Dear all,

i need your help on this. There is a text file, i need to count word frequency for each word with frequency >40 in each line of file and output it into another file with columns like this:

word1,word2,word3, ...wordn
0,0,1
1,2,0
3,2,0 etc -- each raw represents word counts for a line of the original text file

numbers are wordn frequencies in each line of the original file.

This AWK of course does the first part (collects a list of words to count)
Code:
{    
     for (i=1; i<=NF; i++)
          words[$i]++
}
     
END {
for (i in words)
         if (words[i] > 40)
         print i
 }

This does searches and counts

Code:
{
res=gsub(i, " ", all)

print res
}

How do i put them together??? In awk? Sorry, i am a complete newbie.
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Determining Word Frequency of Specific Terms

Hello, I require a perl script that will read a .txt file that contains words like 224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com. 4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com. arroyoeinternet.com. IN A 200.199.227.49 I want to focus on words: IN... (23 Replies)
Discussion started by: richsark
23 Replies

2. Shell Programming and Scripting

Word frequency with additional information

Hello everyone, I am using a chunk of code to display the frequency of a file name in a list of directories. The code looks like this: find . -name "*.log" | cut -d/ -f4 | cut -d. -f1 | awk '{print $1}' | sort | uniq -c | sort -nr The file paths would look something like this:... (1 Reply)
Discussion started by: ToeLint
1 Replies

3. Shell Programming and Scripting

Word Frequency Sort

hello, Here is a program for creating a word-frequency # wf.gk --- program to generate word frequencies from a file { # remove punctuation: This will remove all punctuations from the file gsub(/_]/, "", $0) #Start frequency analysis for (i = 1; i <= NF; i++) freq++ } END #Print output... (11 Replies)
Discussion started by: gimley
11 Replies

4. Shell Programming and Scripting

AWK counter problem

Hi I have a file like below ############################################ # ParentFolder Flag SubFolders Colateral 1 Source1/Checksum CVA 1 Source1/Checksum Flexing 1 VaR/Checksum Flexing 1 SVaR/Checksum FX 1 ... (5 Replies)
Discussion started by: manas_ranjan
5 Replies

5. Shell Programming and Scripting

Help with calculating frequency of specific word in a string

Input file: #read_1 AWEAWQQRZZZQWQQWZ #read_2 ZZAQWRQTWQQQWADSADZZZ #read_3 POGZZZZZZADWRR . . Desired output file: #read_1 3 #read_1 1 #read_2 2 #read_2 3 #read_3 6 . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

6. UNIX for Dummies Questions & Answers

Calculating cumulative frequency using awk

Hi, I wanted to calculate cumulative frequency distribution of my data that involves several arithmetic calls. I did things in excel but its taking me forever. this is what I want to do: var1.txt contains n observations which I have to compute for frequency which is given by 1/n and subsequently... (7 Replies)
Discussion started by: ida1215
7 Replies

7. Shell Programming and Scripting

Shell scripting: frequency of specific word in a string and statistics

Hello friends, I need a BIG help from UNIX collective intelligence: I have a CSV file like this: VALUE,TIMESTAMP,TEXT 1,Sun May 05 16:13:05 +0000 2013,"RT @gracecheree: Praying God sends me a really great man one day. Gotta trust in his timing. 0,Sun May 05 16:13:05 +0000 2013,@sendi__... (19 Replies)
Discussion started by: kraterions
19 Replies

8. UNIX for Dummies Questions & Answers

[Solved] awk solution to add sequential numbers based on a word

Hi experts, I've been struggling to format a large genetic dataset. It's complicated to explain so I'll simply post example input/output $cat input.txt ID GENE pos start end blah1 coolgene 1 3 5 blah2 coolgene 1 4 6 blah3 coolgene 1 4 ... (4 Replies)
Discussion started by: torchij
4 Replies
DPBINDIC(1)						      General Commands Manual						       DPBINDIC(1)

NAME
dpbindic - Convert a binary-form dictionary into a text-form dictionary SYNOPSYS
dpbindic [ -xiu [ frequency ] ] binary-file [ text-file ] DESCRIPTION
dpbindic outputs the file information of the binary-form dictionary file specified in binary-file . At this time, the word information of the dictionary can be output in text form to the standard output. To do so, use test-file to specify the text-form dictionary used as the source of binary-form dictionary file. If this specification is omitted, the text dictionary file information in the binary dictionary file will be output. The standard grammar file name is /usr/local/canna/lib/dic/hyoujun.gram. It will be used if the grammar file name specification is omitted. The output format of word information data is specified using an option. OPTIONS
-x Outputs the data without using omission symbol @, which is used when the initial word represents the reading. -i Replaces the reading and word for output. -u Outputs the candidates used in conversion. Outputs all candidates having frequency or more. If frequency is omitted, all candi- dates having frequency 1 will be output. EXAMPLES
(1) If the text-form dictionary file name is omitted: %dibindic iroha.cbd (Text dictionary file name = Directory size + Word size, packed) iroha.swd = 2985 + 5306 pak a4 iroha.mwd = 36276 + 113139 pak a4 (2) If the text-form dictionary file name iroha.mwd is specified: %dpbindic iroha.cbd iroha.mwd (Text dictionary file name = Directory size + Word size, packed) iroha.mwd = 36276 + 113139 pak a4 SEE ALSO
mkbindic(1), dicar(1) DPBINDIC(1)
All times are GMT -4. The time now is 05:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy