Sponsored Content
Top Forums Shell Programming and Scripting word frequency counter - awk solution? Post 302502096 by Chubler_XL on Sunday 6th of March 2011 11:05:33 PM
Old 03-07-2011
Well spotted - my test data didn't have a line with zero count, fixed below.
Also matches words regardless of their case and removes common punctuantion (eg comma, full stop, semi-colon, colon, brackets, etc.):

Code:
awk '{$0=tolower($0);gsub("[:;.,()!]"," ");t++;
  for(w=1;w<=NF;w++){l[t,$w]++;g[$w]++}}
END {for(w in g) if(g[w]<2) delete g[w]; else printf w " "; print "";
  for(i=1;i<=t;i++){ for(w in g) printf +l[i,w]" "; print ""}}' infile


Last edited by Chubler_XL; 03-07-2011 at 12:33 AM..
This User Gave Thanks to Chubler_XL For This Post:
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Determining Word Frequency of Specific Terms

Hello, I require a perl script that will read a .txt file that contains words like 224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com. 4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com. arroyoeinternet.com. IN A 200.199.227.49 I want to focus on words: IN... (23 Replies)
Discussion started by: richsark
23 Replies

2. Shell Programming and Scripting

Word frequency with additional information

Hello everyone, I am using a chunk of code to display the frequency of a file name in a list of directories. The code looks like this: find . -name "*.log" | cut -d/ -f4 | cut -d. -f1 | awk '{print $1}' | sort | uniq -c | sort -nr The file paths would look something like this:... (1 Reply)
Discussion started by: ToeLint
1 Replies

3. Shell Programming and Scripting

Word Frequency Sort

hello, Here is a program for creating a word-frequency # wf.gk --- program to generate word frequencies from a file { # remove punctuation: This will remove all punctuations from the file gsub(/_]/, "", $0) #Start frequency analysis for (i = 1; i <= NF; i++) freq++ } END #Print output... (11 Replies)
Discussion started by: gimley
11 Replies

4. Shell Programming and Scripting

AWK counter problem

Hi I have a file like below ############################################ # ParentFolder Flag SubFolders Colateral 1 Source1/Checksum CVA 1 Source1/Checksum Flexing 1 VaR/Checksum Flexing 1 SVaR/Checksum FX 1 ... (5 Replies)
Discussion started by: manas_ranjan
5 Replies

5. Shell Programming and Scripting

Help with calculating frequency of specific word in a string

Input file: #read_1 AWEAWQQRZZZQWQQWZ #read_2 ZZAQWRQTWQQQWADSADZZZ #read_3 POGZZZZZZADWRR . . Desired output file: #read_1 3 #read_1 1 #read_2 2 #read_2 3 #read_3 6 . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

6. UNIX for Dummies Questions & Answers

Calculating cumulative frequency using awk

Hi, I wanted to calculate cumulative frequency distribution of my data that involves several arithmetic calls. I did things in excel but its taking me forever. this is what I want to do: var1.txt contains n observations which I have to compute for frequency which is given by 1/n and subsequently... (7 Replies)
Discussion started by: ida1215
7 Replies

7. Shell Programming and Scripting

Shell scripting: frequency of specific word in a string and statistics

Hello friends, I need a BIG help from UNIX collective intelligence: I have a CSV file like this: VALUE,TIMESTAMP,TEXT 1,Sun May 05 16:13:05 +0000 2013,"RT @gracecheree: Praying God sends me a really great man one day. Gotta trust in his timing. 0,Sun May 05 16:13:05 +0000 2013,@sendi__... (19 Replies)
Discussion started by: kraterions
19 Replies

8. UNIX for Dummies Questions & Answers

[Solved] awk solution to add sequential numbers based on a word

Hi experts, I've been struggling to format a large genetic dataset. It's complicated to explain so I'll simply post example input/output $cat input.txt ID GENE pos start end blah1 coolgene 1 3 5 blah2 coolgene 1 4 6 blah3 coolgene 1 4 ... (4 Replies)
Discussion started by: torchij
4 Replies
Regexp::Common::URI::ftp(3pm)				User Contributed Perl Documentation			     Regexp::Common::URI::ftp(3pm)

NAME
Regexp::Common::URI::ftp -- Returns a pattern for FTP URIs. SYNOPSIS
use Regexp::Common qw /URI/; while (<>) { /$RE{URI}{FTP}/ and print "Contains an FTP URI. "; } DESCRIPTION
$RE{URI}{FTP}{-type}{-password}; Returns a regex for FTP URIs. Note: FTP URIs are not formally defined. RFC 1738 defines FTP URLs, but parts of that RFC have been obsoleted by RFC 2396. However, the differences between RFC 1738 and RFC 2396 are such that they aren't applicable straightforwardly to FTP URIs. There are two main problems: Passwords. RFC 1738 allowed an optional username and an optional password (separated by a colon) in the FTP URL. Hence, colons were not allowed in either the username or the password. RFC 2396 strongly recommends passwords should not be used in URIs. It does allow for userinfo instead. This userinfo part may contain colons, and hence contain more than one colon. The regexp returned follows the RFC 2396 specification, unless the {-password} option is given; then the regex allows for an optional username and password, separated by a colon. The ;type specifier. RFC 1738 does not allow semi-colons in FTP path names, because a semi-colon is a reserved character for FTP URIs. The semi-colon is used to separate the path from the option type specifier. However, in RFC 2396, paths consist of slash separated segments, and each segment is a semi-colon separated group of parameters. Straigthforward application of RFC 2396 would mean that a trailing type specifier couldn't be distinguished from the last segment of the path having a two parameters, the last one starting with type=. Therefore we have opted to disallow a semi-colon in the path part of an FTP URI. Furthermore, RFC 1738 allows three values for the type specifier, A, I and D (either upper case or lower case). However, the internet draft about FTP URIs [DRAFT-FTP-URL] (which expired in May 1997) notes the lack of consistent implementation of the D parameter and drops D from the set of possible values. We follow this practise; however, RFC 1738 behaviour can be archieved by using the -type = "[ADIadi]"> parameter. FTP URIs have the following syntax: "ftp:" "//" [ userinfo "@" ] host [ ":" port ] [ "/" path [ ";type=" value ]] When using {-password}, we have the syntax: "ftp:" "//" [ user [ ":" password ] "@" ] host [ ":" port ] [ "/" path [ ";type=" value ]] Under "{-keep}", the following are returned: $1 The complete URI. $2 The scheme. $3 The userinfo, or if {-password} is used, the username. $4 If {-password} is used, the password, else "undef". $5 The hostname or IP address. $6 The port number. $7 The full path and type specification, including the leading slash. $8 The full path and type specification, without the leading slash. $9 The full path, without the type specification nor the leading slash. $10 The value of the type specification. REFERENCES
[DRAFT-URL-FTP] Casey, James: A FTP URL Format. November 1996. [RFC 1738] Berners-Lee, Tim, Masinter, L., McCahill, M.: Uniform Resource Locators (URL). December 1994. [RFC 2396] Berners-Lee, Tim, Fielding, R., and Masinter, L.: Uniform Resource Identifiers (URI): Generic Syntax. August 1998. SEE ALSO
Regexp::Common::URI for other supported URIs. AUTHOR
Damian Conway (damian@conway.org) MAINTAINANCE
This package is maintained by Abigail (regexp-common@abigail.be). BUGS AND IRRITATIONS
Bound to be plenty. LICENSE and COPYRIGHT This software is Copyright (c) 2001 - 2009, Damian Conway and Abigail. This module is free software, and maybe used under any of the following licenses: 1) The Perl Artistic License. See the file COPYRIGHT.AL. 2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2. 3) The BSD Licence. See the file COPYRIGHT.BSD. 4) The MIT Licence. See the file COPYRIGHT.MIT. perl v5.14.2 2010-02-23 Regexp::Common::URI::ftp(3pm)
All times are GMT -4. The time now is 05:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy