Creating verbal structures from a dictionary and a template


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Creating verbal structures from a dictionary and a template
# 1  
Old 06-11-2018
Creating verbal structures from a dictionary and a template

My main aim here is to create a database of verbs in a language [in this case English] to Hindi. The output if it works well will be put up on a University site for researchers to use for Machine Translation. This because one of the main weaknesses of MT is in the area of verbs.
Sorry for the long post but the problem needs clarity which I have tried to provide
I have two files. The first file is a dictionary mapper and the second a template. A sample of each of these is provided below:

The dictionary mapper has the structure. A small sample is given below
Code:
English word=Hindi word
ache=टीस
acquire=मिल
do=कर
go=चल

The template has the following structure

Code:
A set of phrases is provided with English and the corresponding Hindi gloss.
Within the phrase  a slot is present. 
The Slot for English is indicated by  the variable | [pipe]
The Slot for Hindi is indicated by  the variable # [Hash]

As shown in the sample below:
Code:
|=#
|=#ो
Please |=#िए
Please |=#िएगा
I will |=मैं #ूँगा
We will |=हम #ेंगे  
I will |=मैं #ूँगी
We will |=हम #ेंगी 
You will |=तू #ेगा
You will |=तुम #ोगे
You will |=तू #ेगी
You will |=तुम #ोंगी
He will |=वह #ेगा
They will |=वे #ेंगे 
She will |=वह #ेगी
They will |=वे #ेंगी

What I need is a Perl/Awk script which will systematically read each line from the dictionary file, replace the English variable by the English verb and the Hindi variable by the corresponding Hindi gloss and generate out the verbal structures as shown below:
Code:
go=चल
go=चलो
Please go=चलिए
Please go=चलिएगा
I will go=मैं चलूँगा
We will go=हम चलेंगे  
I will go=मैं चलूँगी
We will go=हम चलेंगी 
You will go=तू चलेगा
You will go=तुम चलोगे
You will go=तू चलेगी
You will go=तुम चलोंगी
He will go=वह चलेगा
They will go=वे चलेंगे 
She will go=वह चलेगी
They will go=वे चलेंगी

I know that some post-editing will be needed in the Case of English verbs especially in the past-tense, but I can handle that with macros I have written.
I work in a Windows environment. Many thanks for your kind help. In case the script is put up,along with the data, it will be duly acknowledged.
# 2  
Old 06-11-2018
How about
Code:
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2

?
For "go", try
Code:
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file[12] | grep go
go=चल
go=चलो
Please go=चलिए
Please go=चलिएगा
I will go=मैं चलूँगा
We will go=हम चलेंगे  
I will go=मैं चलूँगी
We will go=हम चलेंगी 
You will go=तू चलेगा
You will go=तुम चलोगे
You will go=तू चलेगी
You will go=तुम चलोंगी
He will go=वह चलेगा
They will go=वे चलेंगे 
She will go=वह चलेगी
They will go=वे चलेंगी

This User Gave Thanks to RudiC For This Post:
# 3  
Old 06-11-2018
Am replying from my phone.Thanks very much. Am out at present and should be back in a couple of hours. I will get back to you on this asap.

---------- Post updated at 03:28 AM ---------- Previous update was at 03:00 AM ----------

Hello,
Ran it on a computer at a friend's place. I copied the awk script as you have provided and saved it as template.gk:
Code:
FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}

I used the files provided in the sample above and ran the script on command line
Code:
gawk32 -f template.gk dictionary.txt template.txt>out

I used a 32 bit version of awk, since this computer as well as mine both are Windows10 OS. This what I got as output. Where did I go wrong in the implementation?
Code:
acquire=मिल|=
go=चल|=
do=कर|=
acquire=मिल|=ो
go=चल|=ो
do=कर|=ो
acquire=मिलPlease |=िए
go=चलPlease |=िए
do=करPlease |=िए
acquire=मिलPlease |=िएगा
go=चलPlease |=िएगा
do=करPlease |=िएगा
acquire=मिलI will |=मैं ूँगा
go=चलI will |=मैं ूँगा
do=करI will |=मैं ूँगा
acquire=मिलWe will |=हम ेंगे  
go=चलWe will |=हम ेंगे  
do=करWe will |=हम ेंगे  
acquire=मिलI will |=मैं ूँगी
go=चलI will |=मैं ूँगी
do=करI will |=मैं ूँगी
acquire=मिलWe will |=हम ेंगी 
go=चलWe will |=हम ेंगी 
do=करWe will |=हम ेंगी 
acquire=मिलYou will |=तू ेगा
go=चलYou will |=तू ेगा
do=करYou will |=तू ेगा
acquire=मिलYou will |=तुम ोगे
go=चलYou will |=तुम ोगे
do=करYou will |=तुम ोगे
acquire=मिलYou will |=तू ेगी
go=चलYou will |=तू ेगी
do=करYou will |=तू ेगी
acquire=मिलYou will |=तुम ोंगी
go=चलYou will |=तुम ोंगी
do=करYou will |=तुम ोंगी
acquire=मिलHe will |=वह ेगा
go=चलHe will |=वह ेगा
do=करHe will |=वह ेगा
acquire=मिलThey will |=वे ेंगे 
go=चलThey will |=वे ेंगे 
do=करThey will |=वे ेंगे 
acquire=मिलShe will |=वह ेगी
go=चलShe will |=वह ेगी
do=करShe will |=वह ेगी
acquire=मिलThey will |=वे ेंगी
go=चलThey will |=वे ेंगी
do=करThey will |=वे ेंगी

Sorry to hassle you, but a hint from you would help. Many thanks for your kind help.
# 4  
Old 06-11-2018
You forgot one essential thing: setting the field separator to = .
This User Gave Thanks to RudiC For This Post:
# 5  
Old 06-11-2018
Many thanks for pointing out the blooper. I guess in my excitement, I forgot the Field separator.
Tried it out and it works perfectly. Many thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Python Web Page Scraping Urls Creating A Dictionary

I have thrown in the towel and cant figure out how to do this. I have a directory of html files that contain urls that I need to scrape (loop through) and add into a dictionary. An example of the output I would like is: bigbadwolf.htlm: https://www.blah.com, http://www.blahblah.com,... (5 Replies)
Discussion started by: metallica1973
5 Replies

2. Solaris

Creating a VMware template

Hello, I am creating a Solaris 11 template on my ESXI host. I would like each VM that is deployed from the template to have its own unique host fingerprint. With Linux, I simply delete host keys, which causes new keys to be generated at bootup (new VM deployment) Is there a way to do this... (1 Reply)
Discussion started by: firefoxx04
1 Replies

3. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies

4. Programming

C++: Creating Matrix template using vector

I want to create a Matrix template that uses vector. For the time being I want to create the following operations. I need setting the implementation for the operations. Maybe I do not have to use a pointer either. template <class T> class Matrix { protected: typedef vector<T>* ... (2 Replies)
Discussion started by: kristinu
2 Replies

5. Shell Programming and Scripting

Creating a dictionary with domain name adjuncted

Hello, I have created a dictionary which has the following structure: DOMAINWORD=(equivalent in English)gloss(es) in Hindi each separated by a comma(equivalent in English)gloss(es) in Hindi each separated by a comma or a semi-colon An example will make this clear ... (13 Replies)
Discussion started by: gimley
13 Replies

6. Shell Programming and Scripting

Creating a larger .xml file from a template(sample file)

Dear All, I have a template xml file like below. ....Some---Header....... <SignalPreference> ... <SignalName>STRING</SignalName> ... </SignalPreference> ......Some formatting text....... <SignalPreference> ......... ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

7. Linux

about system structures

hello can any1 plz tell me about the system defined structures (like sysinfo) which wil give system and n/w charecteristics (ex: freeram in sysinfo). (1 Reply)
Discussion started by: jeenat
1 Replies

8. Programming

Programming using Structures

Hi All, I was given a format of a file, and was asked to write a program which displays the data contained in the file in that purticular format. Its all so confusing. Please find the example of the format as well the code I have written in the attachment. I hope any one of u guyz can... (0 Replies)
Discussion started by: jazz
0 Replies

9. Programming

pointer to structures

Dear friends I have a bit basic doubts in pointers and the structures inter relationships. the first one. static struct apvt { int dead; int pending; int abouttograb; }*agents=NULL; what agents pointer is... (1 Reply)
Discussion started by: tech_voip
1 Replies

10. UNIX for Advanced & Expert Users

Creating new system Makefile template

I am attempting to set-up a Makefile to use for a new system on a Sun Unix machine. I am new to creating Makefiles. I am trying to start simply by compiling a program. I am getting the following error message but an uncertain what 'Error Code 1' is. Is there a web site with Error Codes... (1 Reply)
Discussion started by: CaptainRo
1 Replies
Login or Register to Ask a Question