Sponsored Content
Top Forums Shell Programming and Scripting Grepping a list of words from one file in a master database of homophones Post 302879318 by gimley on Wednesday 11th of December 2013 08:37:25 PM
Old 12-11-2013
Grepping a list of words from one file in a master database of homophones

Hello,
I am sorry if the title is confusing, but I need a script to grep a list of Names from a Source file in a Master database in which all the homophonic variants of the name are listed along with a single indexing key and store all of these in an output file. I need this because I am testing the accuracy of a Homophone algorithm which I have written.

An example will make this clear.Let us assume that the source file has the following entries:
Code:
John
Mary

and the Master file has the following
Code:
Jon<Tab>2003
Jean<Tab>2003
John<Tab>2003
Johan<Tab>2003
Johann<Tab>2003

Mary<Tab>21978
Marie<Tab>21978
Mariam<Tab>21978
Marium<Tab>21978

Each indexed entry is separated by a Space.

The output file would identify all homophones of the Word found in the master file and which are linked by the common index and store them.
The source file has around 30,00+ entries.At present I have to open both files in a text editor. Select a word in the source file and search for it in the master database, copy to clipboard and store in the Output file. Since this is a long and tedious operation, I was wondering if there is a PERL or AWK script which could do the job.
My OS is Windows and all the wonderful UNIX tools don't help.
Many thanks for your help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

List all file names that contain two specific words.

Hi, all: I would like to search all files under "./" and its subfolders recursively to find out those files contain both word "A" and word "B", and list the filenames finally. How to realize that? Cheers JIA (18 Replies)
Discussion started by: jiapei100
18 Replies

2. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up... (21 Replies)
Discussion started by: gimley
21 Replies

3. Shell Programming and Scripting

indexing list of words in a file

Hey all, I'm doing a project currently and want to index words in a webpage. So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage. example: Webpage content data.html ... (2 Replies)
Discussion started by: Johanni
2 Replies

4. Shell Programming and Scripting

Grepping large list of files

Hi All, I need help to know the exact command when I grep large list of files. Either using ls or find command. However I do not want to find in the subdirectories as the number of subdirectories are not fixed. How do I achieve that. I want something like this: find ./ -name "MYFILE*.txt"... (2 Replies)
Discussion started by: angshuman
2 Replies

5. UNIX for Dummies Questions & Answers

Grepping Words with No Vowels

Hi! I was trying to grep all the words in a wordlist, (twl), with no vowels. I had a hard time figuring out how to do it, but I finally lit on the -v flag. Here's my question: Why does this work: grep -v '' twl But this doesn't: grep '' twl In the second example, we're asking for lines... (6 Replies)
Discussion started by: sudon't
6 Replies

6. UNIX Desktop Questions & Answers

How can I replicate master master and master master MySQL databse replication and HA?

I have an application desigend in PHP and MySQl running on apache web server that I is running on a Amazon EC2 server Centos. I want to implement the master-master and master slave replication and high availability disaster recovery on this application database. For this I have created two... (0 Replies)
Discussion started by: Palak Sharma
0 Replies

7. Shell Programming and Scripting

Identifying single words in a dictionary database

I am reworking a Marathi-English dictionary to be out on open-source. My dictionary has the Headword in Marathi, followed by its Part of Speech and subsequently by its English glosses as in the examples below; अकरसणें v i To contract, shrink. अकरा a Eleven. अकराळ a Frightful, terrible. विकराळ... (2 Replies)
Discussion started by: gimley
2 Replies

8. UNIX for Advanced & Expert Users

List all file names that contain two specific words. ( follow up )

Being new to the forum, I tried finding a solution to find files containing 2 words not necessarily on the same line. This thread "List all file names that contain two specific words." answered it in part, but I was looking for a more concise solution. Here's a one-line suggestion... (8 Replies)
Discussion started by: Symbo53
8 Replies

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

10. Shell Programming and Scripting

Deleting a list of words from a text file

Hello, I have a list of words separated by spaces I am trying to delete from a text file, and I could not figure out what is the best way to do this. what I tried (does not work) : delete="password key number verify" arr=($delete) for i in arr { sed "s/\<${arr}\>]*//g" in.txt } >... (5 Replies)
Discussion started by: Hawk4520
5 Replies
TDBTOOL(8)																TDBTOOL(8)

NAME
tdbtool - manipulate the contents TDB files SYNOPSIS
tdbtool tdbtool TDBFILE [COMMANDS...] DESCRIPTION
This tool is part of the samba(1) suite. tdbtool a tool for displaying and altering the contents of Samba TDB (Trivial DataBase) files. Each of the commands listed below can be entered interactively or provided on the command line. COMMANDS
create TDBFILE Create a new database named TDBFILE. open TDBFILE Open an existing database named TDBFILE. erase Erase the current database. dump Dump the current database as strings. cdump Dump the current database as connection records. keys Dump the current database keys as strings. hexkeys Dump the current database keys as hex values. info Print summary information about the current database. insert KEY DATA Insert a record into the current database. move KEY TDBFILE Move a record from the current database into TDBFILE. store KEY DATA Store (replace) a record in the current database. show KEY Show a record by key. delete KEY Delete a record by key. list Print the current database hash table and free list. free Print the current database and free list. ! COMMAND Execute the given system command. first Print the first record in the current database. next Print the next record in the current database. quit Exit tdbtool. CAVEATS
The contents of the Samba TDB files are private to the implementation and should not be altered with tdbtool. VERSION
This man page is correct for version 3.0.25 of the Samba suite. AUTHOR
The original Samba software and related utilities were created by Andrew Tridgell. Samba is now developed by the Samba Team as an Open Source project similar to the way the Linux kernel is developed. TDBTOOL(8)
All times are GMT -4. The time now is 02:31 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy