Sponsored Content
Top Forums Shell Programming and Scripting Grepping a list of words from one file in a master database of homophones Post 302879318 by gimley on Wednesday 11th of December 2013 08:37:25 PM
Old 12-11-2013
Grepping a list of words from one file in a master database of homophones

Hello,
I am sorry if the title is confusing, but I need a script to grep a list of Names from a Source file in a Master database in which all the homophonic variants of the name are listed along with a single indexing key and store all of these in an output file. I need this because I am testing the accuracy of a Homophone algorithm which I have written.

An example will make this clear.Let us assume that the source file has the following entries:
Code:
John
Mary

and the Master file has the following
Code:
Jon<Tab>2003
Jean<Tab>2003
John<Tab>2003
Johan<Tab>2003
Johann<Tab>2003

Mary<Tab>21978
Marie<Tab>21978
Mariam<Tab>21978
Marium<Tab>21978

Each indexed entry is separated by a Space.

The output file would identify all homophones of the Word found in the master file and which are linked by the common index and store them.
The source file has around 30,00+ entries.At present I have to open both files in a text editor. Select a word in the source file and search for it in the master database, copy to clipboard and store in the Output file. Since this is a long and tedious operation, I was wondering if there is a PERL or AWK script which could do the job.
My OS is Windows and all the wonderful UNIX tools don't help.
Many thanks for your help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

List all file names that contain two specific words.

Hi, all: I would like to search all files under "./" and its subfolders recursively to find out those files contain both word "A" and word "B", and list the filenames finally. How to realize that? Cheers JIA (18 Replies)
Discussion started by: jiapei100
18 Replies

2. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up... (21 Replies)
Discussion started by: gimley
21 Replies

3. Shell Programming and Scripting

indexing list of words in a file

Hey all, I'm doing a project currently and want to index words in a webpage. So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage. example: Webpage content data.html ... (2 Replies)
Discussion started by: Johanni
2 Replies

4. Shell Programming and Scripting

Grepping large list of files

Hi All, I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that. I want something like this: find ./ -name "MYFILE*.txt"... (2 Replies)
Discussion started by: angshuman
2 Replies

5. UNIX for Dummies Questions & Answers

Grepping Words with No Vowels

Hi! I was trying to grep all the words in a wordlist, (twl), with no vowels. I had a hard time figuring out how to do it, but I finally lit on the -v flag. Here's my question: Why does this work: grep -v '' twl But this doesn't: grep '' twl In the second example, we're asking for lines... (6 Replies)
Discussion started by: sudon't
6 Replies

6. UNIX Desktop Questions & Answers

How can I replicate master master and master master MySQL databse replication and HA?

I have an application desigend in PHP and MySQl running on apache web server that I is running on a Amazon EC2 server Centos. I want to implement the master-master and master slave replication and high availability disaster recovery on this application database. For this I have created two... (0 Replies)
Discussion started by: Palak Sharma
0 Replies

7. Shell Programming and Scripting

Identifying single words in a dictionary database

I am reworking a Marathi-English dictionary to be out on open-source. My dictionary has the Headword in Marathi, followed by its Part of Speech and subsequently by its English glosses as in the examples below; अकरसणें v i To contract, shrink. अकरा a Eleven. अकराळ a Frightful, terrible. विकराळ... (2 Replies)
Discussion started by: gimley
2 Replies

8. UNIX for Advanced & Expert Users

List all file names that contain two specific words. ( follow up )

Being new to the forum, I tried finding a solution to find files containing 2 words not necessarily on the same line. This thread "List all file names that contain two specific words." answered it in part, but I was looking for a more concise solution. Here's a one-line suggestion... (8 Replies)
Discussion started by: Symbo53
8 Replies

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

10. Shell Programming and Scripting

Deleting a list of words from a text file

Hello, I have a list of words separated by spaces I am trying to delete from a text file, and I could not figure out what is the best way to do this. what I tried (does not work) : delete="password key number verify" arr=($delete) for i in arr { sed "s/\<${arr}\>]*//g" in.txt } >... (5 Replies)
Discussion started by: Hawk4520
5 Replies
Prophet::Test(3pm)					User Contributed Perl Documentation					Prophet::Test(3pm)

   set_editor($code)
       Sets the subroutine that Prophet should use instead of "Prophet::CLI::Command::edit_text" (as this routine invokes an interactive editor)
       to $code.

   set_editor_script SCRIPT
       Sets the editor that Proc::InvokeEditor uses.

       This should be a non-interactive script found in t/scripts.

   import_extra($class, $args)
   in_gladiator($code)
       Run the given code using Devel::Gladiator.

   repo_path_for($username)
       Returns a path on disk for where $username's replica is stored.

   repo_uri_for($username)
       Returns a file:// URI for $USERNAME'S replica (with the correct replica type prefix).

   replica_uuid
       Returns the UUID of the test replica.

   database_uuid
       Returns the UUID of the test database.

   replica_last_rev
       Returns the sequence number of the last change in the test replica.

   as_user($username, $coderef)
       Run this code block as $username.  This routine sets up the %ENV hash so that when we go looking for a repository, we get the user's repo.

   replica_uuid_for($username)
       Returns the UUID of the given user's test replica.

   database_uuid_for($username)
       Returns the UUID of the given user's test database.

   ok_added_revisions( { CODE }, $numbers_of_new_revisions, $msg)
       Checks that the given code block adds the given number of changes to the test replica. $msg is optional and will be printed with the test
       if given.

   serialize_conflict($conflict_obj)
       Returns a simple, serialized version of a Prophet::Conflict object suitable for comparison in tests.

       The serialized version is a hash reference containing the following keys:
	   meta => { original_source_uuid => 'source_replica_uuid' }
	   records => { 'record_uuid' =>
			  { change_type => 'type',
			    props => { propchange_name => { source_old => 'old_val',
							    source_new => 'new_val',
							    target_old => 'target_val',
							  }
				     }
			  },
			'another_record_uuid' =>
			  { change_type => 'type',
			    props => { propchange_name => { source_old => 'old_val',
							    source_new => 'new_val',
							    target_old => 'target_val',
							  }
				     }
			  },
		      }

   serialize_changeset($changeset_obj)
       Returns a simple, serialized version of a Prophet::ChangeSet object suitable for comparison in tests (a hash).

   run_command($command, @args)
       Run the given command with (optionally) the given args using a new Prophet::CLI object. Returns the standard output of that command in
       scalar form or, in array context, the STDOUT in scalar form *and* the STDERR in scalar form.

       Examples:

	   run_command('create', '--type=Foo');

   load_record($type, $uuid)
       Loads and returns a record object for the record with the given type and uuid.

   as_alice CODE, as_bob CODE, as_charlie CODE, as_david CODE
       Runs CODE as alice, bob, charlie or david.

perl v5.10.1							    2009-09-02							Prophet::Test(3pm)
All times are GMT -4. The time now is 02:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy