Average word length


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Average word length
# 1  
Old 12-27-2013
Average word length

If i use
Code:
wc -m filename 

it will generate the number of characters
and
Code:
 
wc -w filename

will generate number of words
if i used this info by dividing number of characters/number of words
it will give me misleading result as number of character will include spaces and punctuation as well as words count
any advice ?
# 2  
Old 12-27-2013
You could try something like this to count alphabetic/numerics only:-
Code:
tr -d " \"\\`!$%^&*()[]{}#~'@;:/?.>,<|" < filename | wc -c



Does that help?



Robin
Liverpool/Blackburn
UK
# 3  
Old 12-27-2013
See whether this could help you.

Code:
$ cat test
     ~!@#$%^&*()_+{}[];:'\/.,<>`|ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz1234567890

$ od -c test
0000000      \t   ~   !   @   #   $   %   ^   &   *   (   )   _   +   {
0000020   }   [   ]   ;   :   '   \   /   .   ,   <   >   `   |   A   B
0000040   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R
0000060   S   T   U   V   W   X   Y   Z  \n   a   b   c   d   e   f   g
0000100   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w
0000120   x   y   z   1   2   3   4   5   6   7   8   9   0  \n
0000136

$ wc -c test
94 test

# Only alphanumeric
$ tr -cd '[:alnum:]' <test 
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890

$ tr -cd '[:alnum:]' <test | wc -c
62

# Except alphanumeric
$ tr -d '[:alnum:]' <test 
     ~!@#$%^&*()_+{}[];:'\/.,<>`|

$ tr -d '[:alnum:]' <test | wc -c
32

This User Gave Thanks to Akshay Hegde For This Post:
# 4  
Old 12-27-2013
Note that wc -c counts bytes. In the ASCII and EBCDIC codesets the number of bytes is equal to the number of characters; but in UTF-8 and several other codesets, there can be more than 1 byte per character and you need to use wc -m to count characters.

A codeset independent way to count only alphabetic and numeric characters in a file is:
Code:
tr -cd '[:alnum:]' < file | wc -m

as long as your LC_CTYPE locale setting indicates a locale based on the codeset used to encode characters in file.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. What is on Your Mind?

Updated Forum Search Index Min Word Length to 2 Chars and Added Quick Search Bar

Today I changed the forum mysql database to permit 2 letter searches: ft_min_word_len=2 I rebuilt the mysql search indexes as well. Then, I added a "quick search bar" at the top of each page. I have tested this and two letter searches are working; but it's not perfect,... (1 Reply)
Discussion started by: Neo
1 Replies

2. What is on Your Mind?

Updated Forum Search Index Min Word Length (Again)

For some reason, three char word lengths were not showing up in search results, even though the minimum is set to three and has been for a long time. After monkeying around with this, I turned off full page search, dumped all search indexes, and re-enabled full text search and it's working... (1 Reply)
Discussion started by: Neo
1 Replies

3. Shell Programming and Scripting

Deleting lines in a fixed length file where there is a word at specific location

I have a big file having 100 K lines. I have to read each line and see at 356 character position whethere there is a word "W" in it. If it is their then don't delete the line otherwise delete it. There are two lines as one Header and one trailer which should remain same. Can somebody... (5 Replies)
Discussion started by: mohit kanoongo
5 Replies

4. UNIX for Dummies Questions & Answers

Find EXACT word in files, just the word: no prefix, no suffix, no 'similar', just the word

I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL. I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies

5. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

6. UNIX for Dummies Questions & Answers

Display all the words whose length is equal to the longest word in the text

Hi Guys, I was going some trial and error to see if I can find the longest word in a text. I was using Pipes because they are easier to use in this case. I was stuck on this for a while so I thought I'll get some help with it. I tried this code to separate all the words in a text in... (4 Replies)
Discussion started by: bawse.c
4 Replies

7. Shell Programming and Scripting

Parse a line which has different word length

Hi All, Please let me know a command to parse the below line and find the words, I have a line like this 40609 39930 In this above line the two words are separted by space.The length of this two words may differ. I want to put 40609 in var_one and 39930 in var_two. Eg. Input line is ... (1 Reply)
Discussion started by: girish.raos
1 Replies

8. Shell Programming and Scripting

how to grep the max length word form a file?

Hi i have a requirement that is extract the max length word from a file ? plz reply (2 Replies)
Discussion started by: vankireddy
2 Replies

9. UNIX for Dummies Questions & Answers

Sed working on lines of small length and not large length

Hi , I have a peculiar case, where my sed command is working on a file which contains lines of small length. sed "s/XYZ:1/XYZ:3/g" abc.txt > xyz.txt when abc.txt contains lines of small length(currently around 80 chars) , this sed command is working fine. when abc.txt contains lines of... (3 Replies)
Discussion started by: thanuman
3 Replies

10. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks (2 Replies)
Discussion started by: r1500
2 Replies
Login or Register to Ask a Question