Randomize letters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Randomize letters
# 15  
Old 06-19-2012
I concocted this awk, you could perhaps give it a try:
Code:
echo "word" | awk 'NR==1{w=$0; l=split(w,W)} length==l{for(i=1;i<=l;i++)if(gsub(W[i],"&",w)!=gsub(W[i],"&"))next;print}' FS= - wordlist

I could be made a bit more efficient still..
This User Gave Thanks to Scrutinizer For This Post:
# 16  
Old 06-19-2012
Perfect! thanks.

So far i have this, which produces many anagrams fast.

Code:
#!/bin/bash
#anagramfinder
while :
do
#WORD=$1
WORD=$(shuf -n 1 /data/korpus2k/ordliste)
AG=$(echo "$WORD" | awk 'NR==1{w=$0; l=split(w,W)} length==l{for(i=1;i<=l;i++)if(gsub(W[i],"&",w)!=gsub(W[i],"&"))next;print}' FS= - /data/korpus2k/ordliste | sed s/$WORD//g; echo)

CHECK=$(echo -w "$AG" | wc -l)
	if [ "$CHECK" -gt "1" ]
	then echo $WORD && echo $AG;echo 
	else  true			
	fi
done

It produces output like this:
Code:
besat
baste bates beast beats besta stabe tabes

klimadebat
debatklima

yderste
dyreste rystede styrede syredet

which is very nice.

I'm still in the dark as to how it can work on the first word, check for anagrams, then the next etc, instead of choosing a random word with shuf -n 1.

Last edited by jeppe83; 06-19-2012 at 09:38 PM..
# 17  
Old 06-19-2012
Hi.

For comparison:
Code:
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 

Description: anagram generator
 Wordplay generates anagrams of words or phrases. For example,
 "Debian GNU/Linux" = "laud benign unix", "nubian lug index",
 "dang nubile unix", or "I debug in lax nun".

wordplay Scrutinizer

Anagrams found:
     1.  ERIC RITZ SUN
     2.  RICE RITZ SUN
...
    48.  CRUZ REST I IN

In the Debian repositories ... cheers, drl
# 18  
Old 06-20-2012
Quote:
Originally Posted by jeppe83
I'm still in the dark as to how it can work on the first word, check for anagrams, then the next etc..
not anymore.
Code:
#!/bin/bash
while IFS= read -r WORD
do
AG=$(echo "$WORD" | awk 'NR==1{w=$0; l=split(w,W)} length==l{for(i=1;i<=l;i++)if(gsub(W[i],"&",w)!=gsub(W[i],"&"))next;print}' FS= - /data/korpus2k/ordliste | sed s/"$WORD"//g; echo)
CHECK=$(echo -w "$AG" | wc -l)
	if [ "$CHECK" -gt "1" ]
	then echo "$WORD" && echo "$AG";echo 
	else true			
	fi 
done <  /data/korpus2k/ordliste

output:
Code:
aaben
aabne

aaberen
aaberne

aaberne
aaberen

aabne
aaben

aabner
barena

aabrinken
karabinen

I'll not pursue any further refinements other than obtaining a more pure wordlist. Thank you for all the help!
# 19  
Old 06-20-2012
Hi.

As a linguistics guy, you may be interested in the Google results for dan melamed perl, which includes a brief intro Dan Melamed's NLP Research Software Library (General Processing Section) and a host of (generally) short perl codes, Index of /~melamed/ftp/tools/genproc

I taught a few classes at West Publishing (now part of Thomson Reuters) and ran into his work at that time.

Best wishes ... cheers, drl
# 20  
Old 06-20-2012
Quote:
I'll not pursue any further refinements other than obtaining a more pure wordlist.
I understand what you mean. I cracked the Wordstar Thesaurus file format and extracted 60,000 unique words to my own crossword cracker database. Then spent a just a few minutes a day updating the list against my preferred reference dictionary and adding the corresponding definition and Thesaurus cross-reference. After only 30+ years and several changes of PC and software, the current list of @ 320,000 words (and over one million Thesaurus entries) is nearly perfect. I have deliberately avoided writing automatic conjugation processes because they are nigh on impossible to get right (which is why so many spell-checkers allow dubious agent nouns and dubious Latin conjugations).
I have a separate database of proper nouns organised by category (e.g. Capital Cities; Characters in Shakespeare plays) which has become rather big over the years, but the number of updates is now relatively low because I now only update what is relevant to the puzzle in front of me.


All this because I like to complete crosswords. It's a hobby.

Last edited by methyl; 06-20-2012 at 09:00 PM..
# 21  
Old 06-20-2012
My word list is derived from a corpus (a large collection or various real-world language - newspaper articles, novels etc.)

Many such corpora are morphologically marked, so it's easy to extract for instance active verbs in 3rd person present tense etc. and make lists corresponding to a morphological category. It's useful for high-level sentence parsers. I've tried experimenting with such, but I haven't yet come across a perfectly marked corpus, and many possible markings are left unmarked (argument structure, semantic roles etc. etc.) The ways words can be marked are many, and even the nature of linguistic categories is heavily debated in academic circles..

Here is an example from a Danish corpus with morphological marking.
Code:
 
eller	 [eller] KC 
som	 [som] INDP nG nN 
specialiserer	 [specialisere] V PR AKT 
sig	 [sig] PERS nG 3S/P ACC 
i	 [i] PRP 
nogle	 [nogen] DET nG P NOM 
af	 [af] PRP 
funktionerne	 [funktion] N UTR P DEF NOM

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Random letters

Hi there, first of all this is not homework...this is a new type of exercise for practicing vocabulary with my students. I have a file consisting of two columns, separated by a tab, each line consisting of a word and its definition, separated by a line break. What i need is to replace a... (15 Replies)
Discussion started by: eldeingles
15 Replies

2. Shell Programming and Scripting

Randomize columns in CSV file

Hi there, friends! Writing exams again! This time my wish would be to randomize certain columns in a csv file. Given a file containing records consisting of 3 columns tab-separated: A B C A B C A B C I would love to get the columns of each record in random order...separated by a tab as... (12 Replies)
Discussion started by: eldeingles
12 Replies

3. UNIX for Dummies Questions & Answers

How to cut only letters?

I was wondering how I could cut only the names of items from the following list: spoons50 cups29 forks50 plates29 I used "man cut" and thought -c would help, but the items have different character lengths. Please note that there is no space between the item and number (so I can't use... (10 Replies)
Discussion started by: PTcharger
10 Replies

4. Shell Programming and Scripting

Randomize a file

Hi, I have a large file that looks like this: @FCC189PACXX:2:1101:1420:2139/1 AGCGAGACTCCGTCTCAAAAAGAAAAAATTTTTCAAAATATTGCAATGGGCTTGTAATTTCTGCTTAAATGTCAGGAGGTCTGAGCCATT + bbbeeeceggggghiiiiiiiiiihfihihiiihhhghiihhihifhihiihhhhhhhhiiigfggggdceeeeebdcc^``bbcbccbb... (3 Replies)
Discussion started by: kylle345
3 Replies

5. Shell Programming and Scripting

Randomize a matrix

--please have a look at my third post in this thread! there I explained it more clearly-- Hey guys. I posted a complex problem few days back. No reply! :| Here is simplified question: I have a matrix with 0/1: * col1 col2 col3 row1 1 0 1 row2 0 0 ... (5 Replies)
Discussion started by: @man
5 Replies

6. UNIX for Advanced & Expert Users

Add letters

I want to add letters A,B,C,… in front of every line of input while printing them out using PERL. eg A file is parsed as a cmd line arg and its context will be displayed as A line1... B line 2.. I tried this..but I want better and perfect solution! !perl -p my $counter; BEGIN { $counter... (4 Replies)
Discussion started by: aadi_uni
4 Replies

7. Shell Programming and Scripting

Need to strip few letters

Hey guys.. Can experts help me in achieving my purpose.. I have a file which contains email address of some 100 to 1000 domains, I need only the domain names.. Eg: abc@yahoo.com hd@gamil.com ed@hotmail.com The output should contain only Yahoo.com ... (5 Replies)
Discussion started by: achararun
5 Replies

8. Shell Programming and Scripting

trim letters

Hello, I have a list of words.. ranging from 4 to any characters long.. to not more than 20 though. How can I select only first seven letters of the list of words? example:- wwwwwwwwww eeeee wererreetf sdsarddrereewtewt sdsdsds sdsd ereetetttt ewtwertwrttrttrtrtwtrww I... (10 Replies)
Discussion started by: fed.linuxgossip
10 Replies

9. UNIX for Dummies Questions & Answers

capital letters GONE!

I have an odd issue. I am trying to copy some files/folders to my linux box via a burned CD which I created on my mac. When I browse the files on the mac (or my windows box), everything looks fine (some of the folder names start with a capital letter, which is needed for everything to work... (8 Replies)
Discussion started by: blogg
8 Replies
Login or Register to Ask a Question