Sponsored Content
Top Forums Shell Programming and Scripting counts the number of distinct words Post 302225937 by Annihilannic on Sunday 17th of August 2008 09:10:53 PM
Old 08-17-2008
Difficult to "guide" you how to do it without just telling you how to do it.... and that way you won't learn anything!

Try using tr to strip out all punctuation (see the -d option), then using tr again to convert all spaces to carriage returns and all upper-case characters to lower-case. Then you can sort the output using the unique option (see the man page) so that you end up with only distinct words, and then count the number of lines produced using wc.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

finding no of counts the words occured

hi, cud u help me to find this. i hav 2 files. file1 has data as "ARUN ARUN is from Australia Arun likes America etc.. ARUN ARUN " file2 has "ARUN Australia America" i... (5 Replies)
Discussion started by: arunsubbhian
5 Replies

2. UNIX for Advanced & Expert Users

Number of days between two distinct dates

Hi I'm looking for a .ksh script/function that will calculate ONLY the number of days between two distinct dates. Further convert the number of days to weeks and display. I need this to be part of another larger script that checks the password expiry on several servers and notifies the... (1 Reply)
Discussion started by: radheymohan
1 Replies

3. Shell Programming and Scripting

how to find number of words

please help me for this "divide the file into multiple files containing no more than 50 lines each and find the number of words of length less than 5 characters" (3 Replies)
Discussion started by: annapurna konga
3 Replies

4. Shell Programming and Scripting

Counts a number of unique word contained in the file and print them in alphabetical order

What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies

5. UNIX for Dummies Questions & Answers

how to get distinct counts in a column of a file

If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you. Y Y N N N P P o Expected Result: Value count Y 2 N 3 P 2 (2 Replies)
Discussion started by: Ariean
2 Replies

6. Shell Programming and Scripting

Shell script to search a pattern in a directory and output number of find counts

I need a Shell script which take two inputs which are 1) main directory where it has to search and 2) pattern to search within main directory all files (.c and .h files) It has to print number of pattern found in main directory & each sub directory. main dir --> Total pattern found = 5 |... (3 Replies)
Discussion started by: vivignesh
3 Replies

7. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

8. Shell Programming and Scripting

How can I sort by n number is like words?

I want to sort a file with a list of words, in order of most occuring words to least occurring words as well as alphabetically. ex: file1: cat 3 cat 7 cat 1 dog 3 dog 5 dog 9 dog 1 ape 4 ape 2 I want the outcome to be: file1.sorted: dog 1 (12 Replies)
Discussion started by: castrojc
12 Replies

9. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

10. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies
SORT(1) 						      General Commands Manual							   SORT(1)

NAME
sort - sort or merge files SYNOPSIS
sort [ -_________x ] [ +pos1 [ -pos2 ] ] ... [ -o name ] [ -T directory ] [ name ] ... DESCRIPTION
Sort sorts lines of all the named files together and writes the result on the standard output. The name `-' means the standard input. If no input files are named, the standard input is sorted. The default sort key is an entire line. Default ordering is lexicographic by bytes in machine collating sequence. The ordering is affected globally by the following options, one or more of which may appear. b Ignore leading blanks (spaces and tabs) in field comparisons. d `Dictionary' order: only letters, digits and blanks are significant in comparisons. f Fold upper case letters onto lower case. i Ignore characters outside the ASCII range 040-0176 in nonnumeric comparisons. n An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value. Option n implies option b. r Reverse the sense of comparisons. tx `Tab character' separating fields is x. The notation +pos1 -pos2 restricts a sort key to a field beginning at pos1 and ending just before pos2. Pos1 and pos2 each have the form m.n, optionally followed by one or more of the flags bdfinr, where m tells a number of fields to skip from the beginning of the line and n tells a number of characters to skip further. If any flags are present they override all the global ordering options for this key. If the b option is in effect n is counted from the first nonblank in the field; b is attached independently to pos2. A missing .n means .0; a missing -pos2 means the end of the line. Under the -tx option, fields are strings separated by x; otherwise fields are nonempty nonblank strings separated by blanks. When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant. These option arguments are also understood: c Check that the input file is sorted according to the ordering rules; give no output unless the file is out of sort. m Merge only, the input files are already sorted. o The next argument is the name of an output file to use instead of the standard output. This file may be the same as one of the inputs. T The next argument is the name of a directory in which temporary files should be made. u Suppress all but one in each set of equal lines. Ignored bytes and bytes outside keys do not participate in this comparison. Examples. Print in alphabetical order all the unique spellings in a list of words. Capitalized words differ from uncapitalized. sort -u +0f +0 list Print the password file (passwd(5)) sorted by user id number (the 3rd colon-separated field). sort -t: +2n /etc/passwd Print the first instance of each month in an already sorted file of (month day) entries. The options -um with just one input file make the choice of a unique representative from a set of equal lines predictable. sort -um +0 -1 dates FILES
/usr/tmp/stm*, /tmp/*: first and second tries for temporary files SEE ALSO
uniq(1), comm(1), rev(1), join(1) DIAGNOSTICS
Comments and exits with nonzero status for various trouble conditions and for disorder discovered under option -c. BUGS
Very long lines are silently truncated. SORT(1)
All times are GMT -4. The time now is 11:25 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy