Sponsored Content
Top Forums Shell Programming and Scripting counts the number of distinct words Post 302225929 by Net-Man on Sunday 17th of August 2008 08:29:40 PM
Old 08-17-2008
counts the number of distinct words

I'm looking to write a sample shell script that counts the number of distinct words in a text file given as Argument.
Remark: White space characters are spaces, tabs, form feeds, and new lines.

JUST with this commands tr, sort, grep. wc.

Thanks.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

finding no of counts the words occured

hi, cud u help me to find this. i hav 2 files. file1 has data as "ARUN ARUN is from Australia Arun likes America etc.. ARUN ARUN " file2 has "ARUN Australia America" i... (5 Replies)
Discussion started by: arunsubbhian
5 Replies

2. UNIX for Advanced & Expert Users

Number of days between two distinct dates

Hi I'm looking for a .ksh script/function that will calculate ONLY the number of days between two distinct dates. Further convert the number of days to weeks and display. I need this to be part of another larger script that checks the password expiry on several servers and notifies the... (1 Reply)
Discussion started by: radheymohan
1 Replies

3. Shell Programming and Scripting

how to find number of words

please help me for this "divide the file into multiple files containing no more than 50 lines each and find the number of words of length less than 5 characters" (3 Replies)
Discussion started by: annapurna konga
3 Replies

4. Shell Programming and Scripting

Counts a number of unique word contained in the file and print them in alphabetical order

What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies

5. UNIX for Dummies Questions & Answers

how to get distinct counts in a column of a file

If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you. Y Y N N N P P o Expected Result: Value count Y 2 N 3 P 2 (2 Replies)
Discussion started by: Ariean
2 Replies

6. Shell Programming and Scripting

Shell script to search a pattern in a directory and output number of find counts

I need a Shell script which take two inputs which are 1) main directory where it has to search and 2) pattern to search within main directory all files (.c and .h files) It has to print number of pattern found in main directory & each sub directory. main dir --> Total pattern found = 5 |... (3 Replies)
Discussion started by: vivignesh
3 Replies

7. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

8. Shell Programming and Scripting

How can I sort by n number is like words?

I want to sort a file with a list of words, in order of most occuring words to least occurring words as well as alphabetically. ex: file1: cat 3 cat 7 cat 1 dog 3 dog 5 dog 9 dog 1 ape 4 ape 2 I want the outcome to be: file1.sorted: dog 1 (12 Replies)
Discussion started by: castrojc
12 Replies

9. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

10. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies
SORT(1) 						      General Commands Manual							   SORT(1)

NAME
sort - sort and/or merge files SYNOPSIS
sort [ -cmuMbdfinrwtx ] [ +pos1 [ -pos2 ] ... ] ... [ -k pos1 [ ,pos2 ] ] ... ' [ -o output ] [ -T dir ... ] [ option ... ] [ file ... ] DESCRIPTION
Sort sorts lines of all the files together and writes the result on the standard output. If no input files are named, the standard input is sorted. The default sort key is an entire line. Default ordering is lexicographic by runes. The ordering is affected globally by the following options, one or more of which may appear. -M Compare as months. The first three non-white space characters of the field are folded to upper case and compared so that precedes etc. Invalid fields compare low to -b Ignore leading white space (spaces and tabs) in field comparisons. -d `Phone directory' order: only letters, accented letters, digits and white space are significant in comparisons. -f Fold lower case letters onto upper case. Accented characters are folded to their non-accented upper case form. -i Ignore characters outside the ASCII range 040-0176 in non-numeric comparisons. -w Like -i, but ignore only tabs and spaces. -n An initial numeric string, consisting of optional white space, optional plus or minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value. -g Numbers, like -n but with optional e-style exponents, are sorted by value. -r Reverse the sense of comparisons. -tx `Tab character' separating fields is x. The notation +pos1 -pos2 restricts a sort key to a field beginning at pos1 and ending just before pos2. Pos1 and pos2 each have the form m.n, optionally followed by one or more of the flags Mbdfginr, where m tells a number of fields to skip from the beginning of the line and n tells a number of characters to skip further. If any flags are present they override all the global ordering options for this key. A missing .n means .0; a missing -pos2 means the end of the line. Under the -tx option, fields are strings separated by x; otherwise fields are non-empty strings separated by white space. White space before a field is part of the field, except under option -b. A b flag may be attached independently to pos1 and pos2. The notation -k pos1[,pos2] is how POSIX sort defines fields: pos1 and pos2 have the same format but different meanings. The value of m is origin 1 instead of origin 0 and a missing .n in pos2 is the end of the field. When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant. These option arguments are also understood: -c Check that the single input file is sorted according to the ordering rules; give no output unless the file is out of sort. -m Merge; assume the input files are already sorted. -u Suppress all but one in each set of equal lines. Ignored bytes and bytes outside keys do not participate in this comparison. -o The next argument is the name of an output file to use instead of the standard output. This file may be the same as one of the inputs. -Tdir Put temporary files in dir rather than in /var/tmp. EXAMPLES
Print in alphabetical order all the unique spellings in a list of words where capitalized words differ from uncapitalized. Print the users file sorted by user name (the second colon-separated field). Print the first instance of each month in an already sorted file. Options -um with just one input file make the choice of a unique representative from a set of equal lines predictable. grep -n '^' input | sort -t: +1f +0n | sed 's/[0-9]*://' A stable sort: input lines that compare equal will come out in their original order. FILES
/var/tmp/sort.<pid>.<ordinal> SOURCE
/src/cmd/sort.c SEE ALSO
uniq(1), look(1) DIAGNOSTICS
Sort comments and exits with non-null status for various trouble conditions and for disorder discovered under option -c. BUGS
An external null character can be confused with an internally generated end-of-field character. The result can make a sub-field not sort less than a longer field. Some of the options, e.g. -i and -M, are hopelessly provincial. SORT(1)
All times are GMT -4. The time now is 10:10 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy