Grepping a list of words from one file in a master database of homophones
Hello,
I am sorry if the title is confusing, but I need a script to grep a list of Names from a Source file in a Master database in which all the homophonic variants of the name are listed along with a single indexing key and store all of these in an output file. I need this because I am testing the accuracy of a Homophone algorithm which I have written.
An example will make this clear.Let us assume that the source file has the following entries:
and the Master file has the following
Each indexed entry is separated by a Space.
The output file would identify all homophones of the Word found in the master file and which are linked by the common index and store them.
The source file has around 30,00+ entries.At present I have to open both files in a text editor. Select a word in the source file and search for it in the master database, copy to clipboard and store in the Output file. Since this is a long and tedious operation, I was wondering if there is a PERL or AWK script which could do the job.
My OS is Windows and all the wonderful UNIX tools don't help.
Many thanks for your help.
So sorry I should have specified the format of the output. The structure should be as under:
The desired out put would be as under:
Many thanks for your interest and prompt response.
Hello,
I have a list of words separated by spaces I am trying to delete from a text file, and I could not figure out what is the best way to do this.
what I tried (does not work) :
delete="password key number verify"
arr=($delete)
for i in arr
{
sed "s/\<${arr}\>]*//g" in.txt
}
>... (5 Replies)
Hello,
I have a dictionary which I am building for the Open Source Community. The data structure is as under
HEADWORD=PARTOFSPEECH=ENGLISH MEANING
as shown in the example below
अ=m=Prefix signifying negation.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection... (2 Replies)
Being new to the forum, I tried finding a solution to find files containing 2 words not necessarily on the same line.
This thread
"List all file names that contain two specific words."
answered it in part, but I was looking for a more concise solution.
Here's a one-line suggestion... (8 Replies)
I am reworking a Marathi-English dictionary to be out on open-source. My dictionary has the Headword in Marathi, followed by its Part of Speech and subsequently by its English glosses as in the examples below;
अकरसणें v i To contract, shrink.
अकरा a Eleven.
अकराळ a Frightful, terrible.
विकराळ... (2 Replies)
I have an application desigend in PHP and MySQl running on apache web server that I is running on a Amazon EC2 server Centos. I want to implement the master-master and master slave replication and high availability disaster recovery on this application database.
For this I have created two... (0 Replies)
Hi!
I was trying to grep all the words in a wordlist, (twl), with no vowels. I had a hard time figuring out how to do it, but I finally lit on the -v flag. Here's my question:
Why does this work:
grep -v '' twl
But this doesn't:
grep '' twl
In the second example, we're asking for lines... (6 Replies)
Hi All,
I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that.
I want something like this:
find ./ -name "MYFILE*.txt"... (2 Replies)
Hey all,
I'm doing a project currently and want to index words in a webpage.
So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage.
example:
Webpage content data.html
... (2 Replies)
Hello,
I have a complex problem. I have a file in which words have been joined together:
Theboy ranslowly
I want to be able to correctly split the words using a lookup file in which all the words occur:
the
boy
ran
slowly
slow
put
child
ly
The lookup file which is meant for look up... (21 Replies)
Hi, all:
I would like to search all files under "./" and its subfolders recursively to find out
those files contain both word "A" and word "B", and list the filenames finally.
How to realize that?
Cheers
JIA (18 Replies)