Sponsored Content
Full Discussion: Text statistics
Top Forums Shell Programming and Scripting Text statistics Post 302811639 by Chubler_XL on Thursday 23rd of May 2013 11:06:28 PM
Old 05-24-2013
For 1 you could do (n=5):

Code:
grep -Ewo "[a-zA-Z]{2,}" file.txt | sort | uniq -c | sort -k1,1rd | head -5

For 2 try:

Code:
grep -Eo "[a-zA-Z]" file.txt | sort | uniq -c | sort -k1,1rd | head -5

For 3 and 4:
Code:
grep -Eo "[a-zA-Z]{2}" file.txt | sort | uniq -c | sort -k1,1rd | head -5
grep -Eo "[a-zA-Z]{3}" file.txt | sort | uniq -c | sort -k1,1rd | head -5

---------- Post updated at 01:06 PM ---------- Previous update was at 12:08 PM ----------

Improvement for 3&4 ("one" should give "on" AND "ne"):

Code:
 awk '{ gsub("[^a-zA-z]"," ");for(i=1;i<=NF;i++) for(j=1;j<length($i);j++) print substr($i,j,2)}' file.txt | sort | uniq -c | sort -k1,1rd | head -5
 awk '{ gsub("[^a-zA-z]"," ");for(i=1;i<=NF;i++) for(j=1;j<length($i)-1;j++) print substr($i,j,3)}' file.txt | sort | uniq -c | sort -k1,1rd | head -5

 

8 More Discussions You Might Find Interesting

1. Programming

Server Statistics ?

I'm trying to write a C program to view server statistics such as: - server general information - CPU usage - memory usage - running processes Cany anybody gives me hints on those system calls ?? ps: I'm using Tru64 unix (6 Replies)
Discussion started by: Agent007
6 Replies

2. Solaris

how to get server statistics

Hello What commands can give following type of information about the server: Time: 20080331.12:10:39 Current CPU: 97.0% Current Memory: 3.7% Current Disk Space: 76% The resources on server is currently not available. Current CPU, Memory, or Disk Space is exceeding threshold Waiting for... (2 Replies)
Discussion started by: shalua
2 Replies

3. HP-UX

packets statistics

Hi there, are there any functions that can get the packets statistics on UNIX ? thanks. (2 Replies)
Discussion started by: Frank2004
2 Replies

4. AIX

Statistics Aix

Hello If there is a way to get a statistics from Aix box server from a month. cpu use, memory, disc use, etc. Maybe via smitty or I need to do a script. The os is Aix 5.3 Greetings (8 Replies)
Discussion started by: lo-lp-kl
8 Replies

5. Shell Programming and Scripting

statistics using awk

Hi, I have 3 columns in a file listed below. X Y X/(X+Y) 1 1 0.5 1 1 0.5 4 1 0.8 1 1 0.5 6 1 0.857142857 1 1 0.5 23 1 0.958333333 Now I want to find confidence interval using the formula for each row. (p-2 sqrt p(1-p)/(x+y), p+2... (7 Replies)
Discussion started by: Diya123
7 Replies

6. Solaris

Anyone help to interpretate os statistics

Hi, Can anyone help me to explain following statistics of my unix box. /usr/sbin/swap -l swapfile dev swaplo blocks free /dev/dsk/c4 118,771 16 33560432 33319776 /dev/dsk/c4 118,763 16 33560432 33327184 /usr/sbin/swap -s total: 13429368k bytes allocated + 9830880k reserved =... (9 Replies)
Discussion started by: giteshtrivedi
9 Replies

7. UNIX for Dummies Questions & Answers

Any way to get process statistics?

Hi, Can someone advise what "generic" command can I use to show statistics of a process or a running script/process? For example, I want to know how many hours/minutes it's taken to run or has been running, how much CPU it used and how much memory it used or uses. I want to be able to... (2 Replies)
Discussion started by: newbie_01
2 Replies

8. Red Hat

CPU Usage statistics Dump in a text file over a period of time

I am facing issue related to performance of one customized application running on RHEL 5.9. The application stalls for some unknown reason that I need to track. For that I require some tool or shell scripts that can monitor the CPU usage statistics (what we get in TOP or in more detail by other... (6 Replies)
Discussion started by: Anjan Ganguly
6 Replies
WILDMAT(3)						     Library Functions Manual							WILDMAT(3)

NAME
wildmat - perform shell-style wildcard matching SYNOPSIS
int wildmat(text, pattern) char *text; char *pattern; DESCRIPTION
Wildmat is part of libinn (3). Wildmat compares the text against the pattern and returns non-zero if the pattern matches the text. The pattern is interpreted according to rules similar to shell filename wildcards, and not as a full regular expression such as those handled by the grep(1) family of programs or the regex(3) or regexp(3) set of routines. The pattern is interpreted as follows: x Turns off the special meaning of x and matches it directly; this is used mostly before a question mark or asterisk, and is not spe- cial inside square brackets. ? Matches any single character. * Matches any sequence of zero or more characters. [x...y] Matches any single character specified by the set x...y. A minus sign may be used to indicate a range of characters. That is, [0-5abc] is a shorthand for [012345abc]. More than one range may appear inside a character set; [0-9a-zA-Z._] matches almost all of the legal characters for a host name. The close bracket, ], may be used if it is the first character in the set. The minus sign, -, may be used if it is either the first or last character in the set. [^x...y] This matches any character not in the set x...y, which is interpreted as described above. For example, [^]-] matches any character other than a close bracket or minus sign. HISTORY
Written by Rich $alz <rsalz@uunet.uu.net> in 1986, and posted to Usenet several times since then, most notably in comp.sources.misc in March, 1991. Lars Mathiesen <thorinn@diku.dk> enhanced the multi-asterisk failure mode in early 1991. Rich and Lars increased the efficiency of star patterns and reposted it to comp.sources.misc in April, 1991. Robert Elz <kre@munnari.oz.au> added minus sign and close bracket handling in June, 1991. This is revision 1.10, dated 1992/04/03. SEE ALSO
grep(1), regex(3), regexp(3). WILDMAT(3)
All times are GMT -4. The time now is 11:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy