03-18-2011
Word Frequency Sort
hello,
Here is a program for creating a word-frequency
# wf.gk --- program to generate word frequencies from a file
{
# remove punctuation: This will remove all punctuations from the file
gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
#Start frequency analysis
for (i = 1; i <= NF; i++)
freq[$i]++
}
END
#Print output
{
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}
The program runs fine but I cannot get the last part to print out the frequency first and then massage the data to sort from Highest to lowest.
Please help and if possible and if it is not too much trouble, could the code be commented to help me and others like me learn.
Many thanks in advance,
Gimley
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Can you Tell me how to sort a word (alphabetically using shell scripts only not by using perl script)
For example :
input word is damodar
Output : aaddmor (1 Reply)
Discussion started by: gyana_cboy
1 Replies
2. Shell Programming and Scripting
Hello,
I require a perl script that will read a .txt file that contains words like
224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com.
4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com.
arroyoeinternet.com. IN A 200.199.227.49
I want to focus on words:
IN... (23 Replies)
Discussion started by: richsark
23 Replies
3. Shell Programming and Scripting
Hello everyone,
I am using a chunk of code to display the frequency of a file name in a list of directories. The code looks like this:
find . -name "*.log" | cut -d/ -f4 | cut -d. -f1 | awk '{print $1}' | sort | uniq -c | sort -nr
The file paths would look something like this:... (1 Reply)
Discussion started by: ToeLint
1 Replies
4. Shell Programming and Scripting
Dear all,
i need your help on this. There is a text file, i need to count word frequency for each word with frequency >40 in each line of file and output it into another file with columns like this:
word1,word2,word3, ...wordn
0,0,1
1,2,0
3,2,0 etc -- each raw represents... (13 Replies)
Discussion started by: irrevocabile
13 Replies
5. Shell Programming and Scripting
Input file:
#read_1
AWEAWQQRZZZQWQQWZ
#read_2
ZZAQWRQTWQQQWADSADZZZ
#read_3
POGZZZZZZADWRR
.
.
Desired output file:
#read_1 3
#read_1 1
#read_2 2
#read_2 3
#read_3 6
.
. (3 Replies)
Discussion started by: perl_beginner
3 Replies
6. Shell Programming and Scripting
Hello,
I have a very large file of around 2 million records which has the following structure:
I have used the standard awk program to sort:
# wordfreq.awk --- print list of word frequencies
{
# remove punctuation
#gsub(/_]/, "", $0)
for (i = 1; i <= NF; i++)
freq++
}
END {
for (word... (3 Replies)
Discussion started by: gimley
3 Replies
7. Shell Programming and Scripting
Input file:
100%ABC2 3.44E-12 USA
A2M%H02579 0E0 UK
100%ABC2 5.34E-8 UK
100%ABC2 3.25E-12 USA
A2M%H02579 5E-45 UK
Output file:
100%ABC2 3.44E-12 USA
100%ABC2 3.25E-12 USA
100%ABC2 5.34E-8 UK
A2M%H02579 0E0 UK
A2M%H02579 5E-45 UK
Code try:
sort -k1,1 -g -k2 -r input.txt... (2 Replies)
Discussion started by: perl_beginner
2 Replies
8. Shell Programming and Scripting
Hello friends, I need a BIG help from UNIX collective intelligence:
I have a CSV file like this:
VALUE,TIMESTAMP,TEXT
1,Sun May 05 16:13:05 +0000 2013,"RT @gracecheree: Praying God sends me a really great man one day. Gotta trust in his timing.
0,Sun May 05 16:13:05 +0000 2013,@sendi__... (19 Replies)
Discussion started by: kraterions
19 Replies
9. UNIX for Advanced & Expert Users
Hi Folks :)
I have a .txt file with thousands of words. I'm trying to sort the lines in order based on number of words per line.
Example
from:
word
word word word
word word
word word word word
word
word word word
word word
to desired output:
word (2 Replies)
Discussion started by: martinsmith
2 Replies
10. UNIX for Beginners Questions & Answers
I have a csv file as shown below,
xop_thy 80 avr_njk 50 str_nyu 60
avr_irt 70 str_nhj 60 avr_ngt 50
str_tgt 80 xop_nmg 50 xop_nth 40
cyv_gty 40 cop_thl 40 vir_tyk 80
vir_plo 20 vir_thk 40 ijk_yuc 70
cop_thy 70 ijk_yuc 80 irt_hgt 80
I need to align/sort the csv file based... (7 Replies)
Discussion started by: dineshkumarsrk
7 Replies
LEARN ABOUT SUSE
word-list-compress
WORD-LIST-COMPRESS(1) Aspell Abbreviated User's Manual WORD-LIST-COMPRESS(1)
NAME
word-list-compress - word list compressor/decompressor for GNU Aspell
SYNOPSIS
word-list-compress c[ompress] | d[ecompress]
DESCRIPTION
word-list-compress compresses or decompresses sorted word lists for use with the GNU Aspell spell checker.
COMMANDS
-c, c, compress
compress the plain text word list read from standard input.
-d, d, decompress
decompress the compressed word list read from standard input.
EXAMPLES
Here are a few examples of how you can use word-list-compress
word-list-compress d <wordlist.cwl >wordlist.txt
Decompress file wordlist.cwl to text file wordlist.txt
word-list-compress c <wordlist.wl >wordlist.cwl 2>errors.txt
Compress wordlist.wl to wordlist.cwl and send any error messages to a text file named errors.txt
LC_COLLATE=C sort -u <wordlist.txt | word-list-compress c >wordlist.cwl
Sort a word list, then pipe it to word-list-compress to create a compressed binary wordlist.cwl file.
word-list-compress d <words.cwl | aspell create master ./words.rws
Decompress a wordlist, then pipe it to aspell(1) to create a spelling list. Please check the aspell(1) info manual for proper usage
and options.
TIPS
Word-list-compress is best used with sorted word list type files. It is not a general purpose compression program since the resulting
files may actually increase in size.
Word-list-compress accepts up to 255 text characters in the range of {0x21...0xFF}. If your word list requires a larger character set for
certain languages or longer length for multi-word, scientific, medical, technical or other use, then it is recommended that you compress
your word list using prezip-bin(1)
DIAGNOSTICS
Word-list-compress normally exits with a return code of 0. If it encounters an error, a message is sent to standard error output (stderr),
and word-list-compress exits with a non-zero return value. Error messages are listed below:
(display help/usage message)
Unknown command given on the command line so word-list-compress displays a usage message to standard error output.
Corrupt Input
This is only for the decompression command d. The input file is of an unknown format or the input file/stream is corrupted. You
may have some valid output, but word-list-compress could not complete the process. If the input file is a compressed wordlist but
you have no output file, then it may be a newer prezip-bin(1) version of compressed file, if so, try decompressing the file with
prezip-bin(1) instead.
Output Data Error
The output is full, write protected, or has an error and can no longer be written to.
SEE ALSO
aspell(1), aspell-import(1), prezip-bin(1), run-with-aspell(1)
Aspell is fully documented in its Texinfo manual. See the `aspell' entry in info for more complete documentation.
REPORTING BUGS
For help, see the Aspell homepage at <http://aspell.net> and send bug reports/comments to the Aspell user list at the above address.
AUTHOR
This manual page was written by Aaron Lehmann <aaronl@vitelus.com>, Brian Nelson <pyro@debian.org> and Jose Da Silva <digital@joescat.com>.
GNU
2005-09-05 WORD-LIST-COMPRESS(1)