Sponsored Content
Full Discussion: words sort
Top Forums UNIX for Dummies Questions & Answers words sort Post 302578842 by CarloM on Friday 2nd of December 2011 12:31:43 PM
Old 12-02-2011
It already extracts words - where a word is a sequence of non-whitespace characters separated by whitespace.

You need to exactly define what you mean by 'word'. Just alphabetic characters? (so no words with hyphens or apostrophes, no punctuation, no acronyms with numbers, no actual numbers in the text, etc.)

If you want to strip out everything but alphabetic and whitespace, try:
Code:
sed 's/[^[:alpha:][:space:]]//g' inputfiles | awk 'BEGIN{OFS="\n"} {$0=tolower($0);$1=$1;print}' | sort -u

or
Code:
sed 's/[^[:alpha:][:space:]]//g' inputfiles | tr '[A-Z ]' '[a-z\n]' |sort -u


Last edited by CarloM; 12-02-2011 at 01:38 PM..
This User Gave Thanks to CarloM For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sort words in a line

Hi Im looking for a way, hopefully a one-liner to sort words in a line e.g "these are the words in a line" to "a are in line the these words" Thanks! (15 Replies)
Discussion started by: rebelbuttmunch
15 Replies

2. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

3. UNIX for Dummies Questions & Answers

Trying to sort words and numbers associated with them.

Hi. I have a file containing words and numbers associated with them as follows - c 2 b 5 c 5 b 6 a 10 b 16 c 18 a 19 b 21 c 27 a 28 b 33 a 76 a 115 c 199 c 251 a 567 a 1909 (4 Replies)
Discussion started by: maq
4 Replies

4. Shell Programming and Scripting

How can I sort by n number is like words?

I want to sort a file with a list of words, in order of most occuring words to least occurring words as well as alphabetically. ex: file1: cat 3 cat 7 cat 1 dog 3 dog 5 dog 9 dog 1 ape 4 ape 2 I want the outcome to be: file1.sorted: dog 1 (12 Replies)
Discussion started by: castrojc
12 Replies

5. Shell Programming and Scripting

How to sort lines according words?

Hello I greped some lines from an xml file and generated a new file. but some entries are missing my table is unsorted. e.g. NAME="Adel" ADDRESS="Donaustr." NUMBER="2" POSTCODE="33333" NAME="Adel" ADDRESS="Donaustr." NUMBER="2" POSTCODE="33333" NAME="Adel" NUMBER="2" POSTCODE="33333"... (5 Replies)
Discussion started by: witchblade
5 Replies

6. Shell Programming and Scripting

Gawk gensub, match capital words and lowercase words

Hi I have strings like these : Vengeance mitt Men Vengeance gloves Women Quatro Windstopper Etip gloves Quatro Windstopper Etip gloves Girls Thermobite hooded jacket Thermobite Triclimate snow jacket Boys Thermobite Triclimate snow jacket and I would like to get the lower case words at... (2 Replies)
Discussion started by: louisJ
2 Replies

7. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Hello, I would like to change my setting in a file to the setting that user input. For example, by default it is ONBOOT=ON When user key in "YES", it would be ONBOOT=YES -------------- This code only adds in the entire user input, but didn't replace it. How do i go about... (5 Replies)
Discussion started by: malfolozy
5 Replies

8. UNIX for Advanced & Expert Users

Sort words based on word count on each line

Hi Folks :) I have a .txt file with thousands of words. I'm trying to sort the lines in order based on number of words per line. Example from: word word word word word word word word word word word word word word word word to desired output: word (2 Replies)
Discussion started by: martinsmith
2 Replies

9. Shell Programming and Scripting

Search words in any quote position and then change the words

hi, i need to replace all words in any quote position and then need to change the words inside the file thousand of raw. textfile data : "Ninguno","Confirma","JuicioABC" "JuicioCOMP","Recurso","JuicioABC" "JuicioDELL","Nulidad","Nosino" "Solidade","JuicioEUR","Segundo" need... (1 Reply)
Discussion started by: benjietambling
1 Replies

10. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies
FOO(1)								     Commands								    FOO(1)

NAME
wordplay - anagram finder SYNOPSIS
wordplay string [-slxavnmd] [-w word] [-f wordfile] DESCRIPTION
wordplay is an anagram finder. What is an anagram? Well, let's turn to Merriam-Webster's Collegiate Dictionary, Tenth Edition: anagram: a word or phrase made by transposing the letters of another word or phrase. Each letter in the anagram must appear with the same frequency as in the original string. For example, the letters in the word "stop" can be rearranged to spell "tops" or "pots" or "sotp". "sotp" is not a word and is not of interest when generating anagrams. "stop" has four letters, so there are 24 ways to rearrange its letters. However, very few of the re- arrangements actually spell words. Wordplay, by using a list of words, takes a specified string of letters and uses the list of words to find anagrams of the string. By the way, "Wordplay" anagrams to "Rowdy Pal", and the program really can live up to that particular anagram. I have been able to come up with anagrams of most of my coworkers' names that are humorous, descriptive, satirical, or, occasionally, quite vulgar. OPTIONS
string String to be anagrammed. This should be seen to the program as a single argument. If you feel you must put spaces in the string, under UNIX, you will have to put backslashes in front of the spaces or just put the entire string in double quotes. Just leave the spaces out because the program throws them out anyway. -s Silent operation. If this option is used, the header and line numbers are not printed. This is useful if you want the output to contain only the anagrams. Use this option with the l (and x) option to generate a wordlist which can be piped or redirected. This option does not suppress error messages that are printed to stderr. Finding zero anagrams is not an error. -l Print list of candidate words before anagramming. This is the list of words that can be spelled with the letters from the specified string, with no letters being used more often that they appear in the input string. -x Do not perform anagramming. Use with l if you just want the candidate word list without anagrams. -a Allow anagrams containing two or more occurrences of a word. -v Consider strings with no vowels as candidate words and do not give up when there are no vowels remaining after extractions. -m Limit candidate word length to a maximum number of letters. Follow by an integer. m12 means limit words to 12 letters. m5 means limit them to 5 letters. -n Limit candidate word length to a minimum number of letters. Follow by an integer. n2 means limit words to 2 letters. n11 means limit them to 11 letters. -d Limit number of words in anagrams to a maximum number. Follow by an integer. d3 means no anagrams should contain more than 3 words. d12 means limit anagrams to 12 words. This is currently the option that I recommend to limit output, since an optimization has been added to speed execution in some cases when this option is used. -w Specify a word which should appear in all anagrams. This is useful if you already have a word in mind that you want in the ana- grams. This option should be specified at the end of the command, followed by a space and the word to use. -f Specify which word list to use. See example! This option should be specified at the end of the command, followed by a space and the alternate wordfile name. This is useful if you have other word lists to try or if you are interested in making your own custom- ized word list. New feature: Use a hyphen as the filename if the wordlist should be read from stdin. EXAMPLES
wordplay persiangulf Anagram the string "persiangulf" . wordplay anagramming -lx Print the list of words from the wordlist that can be spelled by using the letters from the word "anagramming". A letter may not be used more often than the number of times it occurs in the word "anagramming". No anagrams are generated. wordplay tomservocrow -n3m8 Anagram the string "tomservocrow" . Do not use words shorter than 3 letters or longer than 8 letters. wordplay persiangulf -ld3m10 -f /usr/share/dict/words Print the candidate words for the string "persiangulf". Print anagrams containing up to 3 words, without considering any words longer than 10 characters. Use the file "/usr/share/dict/words" rather than "words721.txt". wordplay soylentgreen -n3w stolen -f w2 Print anagrams of "soylentgreen" containing the word "stolen" and use the file "w2" as the wordlist file. Discard candidate words shorter than 3 characters. wordplay university -slx Print the candidate word list for the string "university". The output will consist of just the words. This output is more useful for redirecting to a file or for piping to another program. wordplay trymeout -s Anagram the string "trymeout" and print the anagrams with no line numbers. The header will not be printed. This is useful for pip- ing the output to another process (or saving it to a file to be used by another program) without having to parse the output to remove the numbers and header. wordplay trymeout -v Anagram "trymeout" as usual, but in case vowel-free strings are in the wordlist, consider them as possible candidate words. cat wordlist1 wordlist2 wordlist3 | sort -u | wordplay trymeout -f - Anagram "trymeout" and read the wordlist from stdin, so that, in this case, the three wordlists "wordlist1", "wordlist2", and "wordlist3" will be concatenated and piped into wordplay as the wordlist. The "sort -u" is there to remove duplicate words from the combined wordlist. NOTES
If the option specifiers are combined, as in "an7m7d5f" or "d3n5f", the f should come last, followed by a space and the word list file. The "w" option is used in the same manner. Limit the number of words to consider, if desired, using the n and m options, or better yet, use the d option to limit depth, when anagram- ming certain time-consuming strings. The program is currently optimized to speed execution in some cases when the d option is used. It is highly recommended that the "words721.txt" file distributed with the program be used, since many nonsense two and three-letter combi- nations that are not words have been eliminated. This makes the quality of the output slightly better and speeds execution of the program a slight bit. Any word list may be used, as long as there is one word per line. Feel free to create your own custom word list and use it instead. The word list does not have to be sorted in any particular way. FILES
/usr/share/games/wordplay/words721.txt Default word list file. DISTRIBUTION
This program was written for fun and is free. Distribute it as you please, but please distribute the entire package, with the original words721.txt and the readme file. If you modify the code, please mention my name in it as the original author. Please send me a copy of improvements you make, because I may include them in a future version. AUTHOR
Wordplay was written by Evans A Criswell <criswell@cs.uah.edu> This man page was written by Joey Hess <joeyh@debian.org> DECEMBER 1996 FOO(1)
All times are GMT -4. The time now is 10:00 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy