Grepping a list of words from one file in a master database of homophones
Hello,
I am sorry if the title is confusing, but I need a script to grep a list of Names from a Source file in a Master database in which all the homophonic variants of the name are listed along with a single indexing key and store all of these in an output file. I need this because I am testing the accuracy of a Homophone algorithm which I have written.
An example will make this clear.Let us assume that the source file has the following entries:
and the Master file has the following
Each indexed entry is separated by a Space.
The output file would identify all homophones of the Word found in the master file and which are linked by the common index and store them.
The source file has around 30,00+ entries.At present I have to open both files in a text editor. Select a word in the source file and search for it in the master database, copy to clipboard and store in the Output file. Since this is a long and tedious operation, I was wondering if there is a PERL or AWK script which could do the job.
My OS is Windows and all the wonderful UNIX tools don't help.
Many thanks for your help.
Hi, all:
I would like to search all files under "./" and its subfolders recursively to find out
those files contain both word "A" and word "B", and list the filenames finally.
How to realize that?
Cheers
JIA (18 Replies)
Hello,
I have a complex problem. I have a file in which words have been joined together:
Theboy ranslowly
I want to be able to correctly split the words using a lookup file in which all the words occur:
the
boy
ran
slowly
slow
put
child
ly
The lookup file which is meant for look up... (21 Replies)
Hey all,
I'm doing a project currently and want to index words in a webpage.
So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage.
example:
Webpage content data.html
... (2 Replies)
Hi All,
I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that.
I want something like this:
find ./ -name "MYFILE*.txt"... (2 Replies)
Hi!
I was trying to grep all the words in a wordlist, (twl), with no vowels. I had a hard time figuring out how to do it, but I finally lit on the -v flag. Here's my question:
Why does this work:
grep -v '' twl
But this doesn't:
grep '' twl
In the second example, we're asking for lines... (6 Replies)
I have an application desigend in PHP and MySQl running on apache web server that I is running on a Amazon EC2 server Centos. I want to implement the master-master and master slave replication and high availability disaster recovery on this application database.
For this I have created two... (0 Replies)
I am reworking a Marathi-English dictionary to be out on open-source. My dictionary has the Headword in Marathi, followed by its Part of Speech and subsequently by its English glosses as in the examples below;
अकरसणें v i To contract, shrink.
अकरा a Eleven.
अकराळ a Frightful, terrible.
विकराळ... (2 Replies)
Being new to the forum, I tried finding a solution to find files containing 2 words not necessarily on the same line.
This thread
"List all file names that contain two specific words."
answered it in part, but I was looking for a more concise solution.
Here's a one-line suggestion... (8 Replies)
Hello,
I have a dictionary which I am building for the Open Source Community. The data structure is as under
HEADWORD=PARTOFSPEECH=ENGLISH MEANING
as shown in the example below
अ=m=Prefix signifying negation.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection... (2 Replies)
Hello,
I have a list of words separated by spaces I am trying to delete from a text file, and I could not figure out what is the best way to do this.
what I tried (does not work) :
delete="password key number verify"
arr=($delete)
for i in arr
{
sed "s/\<${arr}\>]*//g" in.txt
}
>... (5 Replies)
Discussion started by: Hawk4520
5 Replies
LEARN ABOUT DEBIAN
prophet::test
Prophet::Test(3pm) User Contributed Perl Documentation Prophet::Test(3pm)
set_editor($code)
Sets the subroutine that Prophet should use instead of "Prophet::CLI::Command::edit_text" (as this routine invokes an interactive editor)
to $code.
set_editor_script SCRIPT
Sets the editor that Proc::InvokeEditor uses.
This should be a non-interactive script found in t/scripts.
import_extra($class, $args)
in_gladiator($code)
Run the given code using Devel::Gladiator.
repo_path_for($username)
Returns a path on disk for where $username's replica is stored.
repo_uri_for($username)
Returns a file:// URI for $USERNAME'S replica (with the correct replica type prefix).
replica_uuid
Returns the UUID of the test replica.
database_uuid
Returns the UUID of the test database.
replica_last_rev
Returns the sequence number of the last change in the test replica.
as_user($username, $coderef)
Run this code block as $username. This routine sets up the %ENV hash so that when we go looking for a repository, we get the user's repo.
replica_uuid_for($username)
Returns the UUID of the given user's test replica.
database_uuid_for($username)
Returns the UUID of the given user's test database.
ok_added_revisions( { CODE }, $numbers_of_new_revisions, $msg)
Checks that the given code block adds the given number of changes to the test replica. $msg is optional and will be printed with the test
if given.
serialize_conflict($conflict_obj)
Returns a simple, serialized version of a Prophet::Conflict object suitable for comparison in tests.
The serialized version is a hash reference containing the following keys:
meta => { original_source_uuid => 'source_replica_uuid' }
records => { 'record_uuid' =>
{ change_type => 'type',
props => { propchange_name => { source_old => 'old_val',
source_new => 'new_val',
target_old => 'target_val',
}
}
},
'another_record_uuid' =>
{ change_type => 'type',
props => { propchange_name => { source_old => 'old_val',
source_new => 'new_val',
target_old => 'target_val',
}
}
},
}
serialize_changeset($changeset_obj)
Returns a simple, serialized version of a Prophet::ChangeSet object suitable for comparison in tests (a hash).
run_command($command, @args)
Run the given command with (optionally) the given args using a new Prophet::CLI object. Returns the standard output of that command in
scalar form or, in array context, the STDOUT in scalar form *and* the STDERR in scalar form.
Examples:
run_command('create', '--type=Foo');
load_record($type, $uuid)
Loads and returns a record object for the record with the given type and uuid.
as_alice CODE, as_bob CODE, as_charlie CODE, as_david CODE
Runs CODE as alice, bob, charlie or david.
perl v5.10.1 2009-09-02 Prophet::Test(3pm)