08-17-2008
Difficult to "guide" you how to do it without just telling you how to do it.... and that way you won't learn anything!
Try using tr to strip out all punctuation (see the -d option), then using tr again to convert all spaces to carriage returns and all upper-case characters to lower-case. Then you can sort the output using the unique option (see the man page) so that you end up with only distinct words, and then count the number of lines produced using wc.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
hi,
cud u help me to find this.
i hav 2 files.
file1 has data as "ARUN
ARUN is from Australia
Arun likes America etc..
ARUN ARUN "
file2 has "ARUN
Australia
America"
i... (5 Replies)
Discussion started by: arunsubbhian
5 Replies
2. UNIX for Advanced & Expert Users
Hi
I'm looking for a .ksh script/function that will calculate ONLY the number of days between two distinct dates. Further convert the number of days to weeks and display. I need this to be part of another larger script that checks the password expiry on several servers and notifies the... (1 Reply)
Discussion started by: radheymohan
1 Replies
3. Shell Programming and Scripting
please
help me for this
"divide the file into multiple files containing no more than 50 lines each and find the number of words of length less than 5 characters" (3 Replies)
Discussion started by: annapurna konga
3 Replies
4. Shell Programming and Scripting
What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies
5. UNIX for Dummies Questions & Answers
If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you.
Y
Y
N
N
N
P
P
o
Expected Result:
Value count
Y 2
N 3
P 2 (2 Replies)
Discussion started by: Ariean
2 Replies
6. Shell Programming and Scripting
I need a Shell script which take two inputs which are
1) main directory where it has to search and
2) pattern to search within main directory all files (.c and .h files)
It has to print number of pattern found in main directory & each sub directory.
main dir --> Total pattern found = 5
|... (3 Replies)
Discussion started by: vivignesh
3 Replies
7. UNIX for Dummies Questions & Answers
Hi !
input:
A|B|C|D
A|F|C|E
A|B|I|C
A|T|I|B
As the title of the thread says, I would need to get:
1|3|2|4
I tried different variants of this command, but I don't manage to obtain what I need:
gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input
... (2 Replies)
Discussion started by: beca123456
2 Replies
8. Shell Programming and Scripting
I want to sort a file with a list of words, in order of most occuring words to least occurring words as well as alphabetically.
ex:
file1:
cat 3
cat 7
cat 1
dog 3
dog 5
dog 9
dog 1
ape 4
ape 2
I want the outcome to be:
file1.sorted:
dog 1 (12 Replies)
Discussion started by: castrojc
12 Replies
9. Shell Programming and Scripting
Hi ,
I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies
10. Shell Programming and Scripting
The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :).
file
chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies
SORT(1) General Commands Manual SORT(1)
NAME
sort - sort or merge files
SYNOPSIS
sort [ -_________x ] [ +pos1 [ -pos2 ] ] ... [ -o name ] [ -T directory ] [ name ] ...
DESCRIPTION
Sort sorts lines of all the named files together and writes the result on the standard output. The name `-' means the standard input. If
no input files are named, the standard input is sorted.
The default sort key is an entire line. Default ordering is lexicographic by bytes in machine collating sequence. The ordering is
affected globally by the following options, one or more of which may appear.
b Ignore leading blanks (spaces and tabs) in field comparisons.
d `Dictionary' order: only letters, digits and blanks are significant in comparisons.
f Fold upper case letters onto lower case.
i Ignore characters outside the ASCII range 040-0176 in nonnumeric comparisons.
n An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is
sorted by arithmetic value. Option n implies option b.
r Reverse the sense of comparisons.
tx `Tab character' separating fields is x.
The notation +pos1 -pos2 restricts a sort key to a field beginning at pos1 and ending just before pos2. Pos1 and pos2 each have the form
m.n, optionally followed by one or more of the flags bdfinr, where m tells a number of fields to skip from the beginning of the line and n
tells a number of characters to skip further. If any flags are present they override all the global ordering options for this key. If the
b option is in effect n is counted from the first nonblank in the field; b is attached independently to pos2. A missing .n means .0; a
missing -pos2 means the end of the line. Under the -tx option, fields are strings separated by x; otherwise fields are nonempty nonblank
strings separated by blanks.
When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal
are ordered with all bytes significant.
These option arguments are also understood:
c Check that the input file is sorted according to the ordering rules; give no output unless the file is out of sort.
m Merge only, the input files are already sorted.
o The next argument is the name of an output file to use instead of the standard output. This file may be the same as one of the
inputs.
T The next argument is the name of a directory in which temporary files should be made.
u Suppress all but one in each set of equal lines. Ignored bytes and bytes outside keys do not participate in this comparison.
Examples. Print in alphabetical order all the unique spellings in a list of words. Capitalized words differ from uncapitalized.
sort -u +0f +0 list
Print the password file (passwd(5)) sorted by user id number (the 3rd colon-separated field).
sort -t: +2n /etc/passwd
Print the first instance of each month in an already sorted file of (month day) entries. The options -um with just one input file make the
choice of a unique representative from a set of equal lines predictable.
sort -um +0 -1 dates
FILES
/usr/tmp/stm*, /tmp/*: first and second tries for temporary files
SEE ALSO
uniq(1), comm(1), rev(1), join(1)
DIAGNOSTICS
Comments and exits with nonzero status for various trouble conditions and for disorder discovered under option -c.
BUGS
Very long lines are silently truncated.
SORT(1)