Visit The New, Modern Unix Linux Community


Frequency of Words in a File, sed script from 1980


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Frequency of Words in a File, sed script from 1980
# 1  
Frequency of Words in a File, sed script from 1980

Code:
tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | sed ${1:-25} < book7.txt

This is not my script, it can be found way back from 1980 but once it worked fine to give me the most used words in a text file.
Now the shell is complaining about an error in sed

Code:
sed: -e expression #1, Character 2: missing command

The instruction to this one liner tells to set it into an executable script, but lazy people ask, because in my former configuration it worked fine to find the most used words in a large text file. So can anyone give me a hint on the error of sed and its missing expression to the characters. I am trying this in the very directory where the file of book7.txt is located.
Thanks in advance.
# 2  
One might guess that a current sed would work if you change:
Code:
sed ${1:-25}

in that pipeline to:
Code:
sed -n "1,${1:-25}p"

which would print the 1st 25 lines if no command line arguments are given to your script or the top X lines if the 1st argument to your script is X.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Did you follow the instruction? And run the executable script with an adequate parameter?In bourne compatible shells, the ${1:-25} expands to the first positional parameter's contents or - if missing - to 25; c.f. man bash:
Quote:
${parameter:-word} Use Default Values.
With no parameter given, I get the same error message as you do, as sed can't cope with a 25 as the sole "command". With a first positional parameter of e.g. 1,15!d, above script will print the 15 topmost words in the text presented.


I'm a bit surprised that script should have ever run with no parameters given.
# 4  
Where do you think tr is getting its input?
# 5  
Quote:
Originally Posted by cfajohnson
Where do you think tr is getting its input?
Good point. A better chance at a working script might be any one of the following three commands:
Code:
{ tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | head -n ${1:-25}
} < book7.txt

or:
Code:
(tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | head -n ${1:-25}) < book7.txt

or:
Code:
tr -cs A-Za-z\' '\n' < book7.txt | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | head -n ${1:-25}

This User Gave Thanks to Don Cragun For This Post:
# 6  
@Rudi C It worked under debian squeeze.
@ Don Cragun I will try the given options, many thanks, really
@cfajohnson I do not know, I thought it would be given by the < character
While moving to another living space, I will try this given hints, and reply which one solved the problem. Will take some days...
@Don Cragun

all I get as an answer is that there is a wrong modifier in all three cases

[/CODE]
$ (-)
[CODE]So I guess it would be better to make it an executable script to test it. And taking out the " - " character did not work either




Well, I am probably unable to just apply a script. The following is from 2009, and should work as well counting the frequency or occurrence of words in a given textfile.


Code:
  cat test.file | tr -d '[:punct:]' | tr ' ' '\n' | tr 'A-Z' 'a-z' | sort | uniq -c | sort -rn

I put in another .txt-file and it works fine.

Last edited by 1in10; 08-14-2016 at 06:20 PM.. Reason: solved

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #805
Difficulty: Medium
The goal of a Digital Signal Processor (DSP) is usually to measure, filter or compress discrete digital signals.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Write Linux script to convert timestamps older than 1.1.1970 to 1.1.1980

I am having problems because some of my files have timestamps that are earlier that 1.1.1970, the Unix start of time convention. So I would like to write a script that finds all files in home folder and subfolders with timestamps earlier than 1.1.1970 and converts them to 1.1.1980. I... (3 Replies)
Discussion started by: francus
3 Replies

2. Shell Programming and Scripting

Assigning the same frequency to more than one words in a file

I have a file of names with the following structure NAME FREQUENCY NAME NAME FREQUENCY NAME NAME NAME FREQUENCY i.e. more than one name is assigned the same frequency. An example will make this clear SANDHYA DAS 6901 ARATI DAS 6201 KALPANA DAS 4714 GITA DAS 4550 BISWANATH DAS 3949... (4 Replies)
Discussion started by: gimley
4 Replies

3. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Discussion started by: gimley
7 Replies

4. Shell Programming and Scripting

Script to sort large file with frequency

Hello, I have a very large file of around 2 million records which has the following structure: I have used the standard awk program to sort: # wordfreq.awk --- print list of word frequencies { # remove punctuation #gsub(/_]/, "", $0) for (i = 1; i <= NF; i++) freq++ } END { for (word... (3 Replies)
Discussion started by: gimley
3 Replies

5. Shell Programming and Scripting

count frequency of words in a file

I need to write a shell script "cmn" that, given an integer k, print the k most common words in descending order of frequency. Example Usage: user@ubuntu:/$ cmn 4 < example.txt :b: (3 Replies)
Discussion started by: mohit_iitk
3 Replies

6. Shell Programming and Scripting

SED - delete words between two possible words

Hi all, I want to make an script using sed that removes everything between 'begin' (including the line that has it) and 'end1' or 'end2', not removing this line. Let me paste an 2 examples: anything before any string begin few lines of content end1 anything after anything before any... (4 Replies)
Discussion started by: meuser
4 Replies

7. Shell Programming and Scripting

Using Sed to Delete Words in a File

This is a Nagios situation. So i have a list of servers in one file called Servers.txt And in another file called hostgroups.cfg, i want to remove each and every one of the servers in the Servers.txt file. The problem is, the script I wrote is having a problem removing the exact servers in... (5 Replies)
Discussion started by: SkySmart
5 Replies

8. UNIX for Dummies Questions & Answers

sed how to delete between two words within a file

I'm hoping someone could help me out please :) I have several .txt files with several hundred lines in each that look like this: 10241;</td><td>10241</td><td class="b">x2801;</td><td>2801</td><td>TEXT-1</td></tr> 10242;</td><td>10242</td><td... (4 Replies)
Discussion started by: martinsmith
4 Replies

9. UNIX for Dummies Questions & Answers

sed replace words in file and keep some

lets see if i can explain this in a good way. im trying to replace some words in a file but i need to know what the words are that is beeing replaced. not sure if sed can do this. file.name.something.1DATA01.something.whatever sed "s/./.DATA?????/g" need to know what the first . is... (2 Replies)
Discussion started by: cas
2 Replies

10. UNIX for Dummies Questions & Answers

sed option to delete two words within a file

Could someone please help me with the following. I'm trying to figure out how to delete two words within a specific file using sed. The two words are directory and named. I have tried the following: sed '//d' sedfile sed '//d' sedfile both of these options do not work..... ... (4 Replies)
Discussion started by: klannon
4 Replies

Featured Tech Videos