Creating a master file of conjugated verbs by concatenating root and inflection from separate files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Creating a master file of conjugated verbs by concatenating root and inflection from separate files
# 1  
Old 01-16-2018
Creating a master file of conjugated verbs by concatenating root and inflection from separate files

Excuses for the long descriptive title.
I am working with Sindhi and developing a database of all verbal conjugations in that language.
I have generated 2 files:
Verbs.dic contains all the verbs, one verb per line
Inflections.dic contains the verbal conjugations which need to be appended to each verb.
An example will make this clear. I am choosing English for clarity and have chosen a very simple set, given the complexity of English verbs.
The input files are as under
Verbs.dic

Code:
walk
talk
seat
pick
laugh

Inflections.dic
Code:
s
ing
ed

What I need is a Perl or Awk script which will take the list of inflections from inflections .dic and append each of the inflections to each verb in the list in Verbs.dic. The resultant output would be as under:
Output
Code:
walks
walking
walked
talks
talking
talked
seats
seated
seating
picks
picking
picked
laughs
laughing
laughed

In English the list of inflections is pretty limited, in Sindhi the number of inflections range from 35-40 and generating them out manually is impossible.
Please note: I work unfortunately under a windows environment
All good wishes for the New Year and many thanks in advance
# 2  
Old 01-16-2018
With well over 250 posts, we would hope that you could copy what has been done in lots of other treads in this forum doing more complex tasks than what are being requested here.

What have you tried to solve this problem on your own?

Is Cygwin installed on your system?

Do you have awk?

Do you have bash or ksh?
# 3  
Old 01-16-2018
Dear Don Cragun,
Thanks for taking time off to reply. As usual, I hunted for this specific issue but could not find an answer. I hope I did not miss out the solution.
I have awk/sed and perl on my machine. Am waiting for the Fall Creator Update to be able to use Linux on my computer, which will make life easier for me.
Many thanks

---------- Post updated at 08:47 AM ---------- Previous update was at 08:43 AM ----------

Hello,
I found the answer. Thanks for alerting me:
Code:
awk 'FNR==NR {S[$1];next} {printf "%s", $1; for (s in S) printf ",%s%s", $1, s; printf "\n"}' suffs root

It was very stupid of me. Sorry for the bother. Thanks once again.
This User Gave Thanks to gimley For This Post:
# 4  
Old 01-16-2018
You don't need to apologize. Just think before posting. You're capable of doing more than you seem to realize. I'm glad you were able to do what you needed to do.

Always show us what you have tried when posting a question. It helps us understand where you are stuck and helps us provide better guidance.
# 5  
Old 01-17-2018
Thanks for your kind words. I am 70 years old. Accustomed to C programming and I guess in my hurry I forgot to check what is already available.

---------- Post updated 01-17-18 at 02:06 AM ---------- Previous update was 01-16-18 at 10:01 PM ----------

Dear Don,
Sorry to bother you.
I implemented the following awk script to handle the problem of concatenating two files where the first is the suffix file and the second is the root file
Code:
FNR==NR {S[$1];next} {printf "%s", $1; for (s in S) printf "/n%s%s", $1, s; printf "\n"}suffix root>root.out

The sample root file contains the following
Code:
walk
talk
seat
pick
laugh

The suffix file contains 3 suffixes in this order: The order is important.
Code:
s
ing
ed

However when the file is generated the order changes and a peculiar sort order is imposed. The output of the file is as under:
Code:
walk
walked
walks
walking
talk
talked
talks
talking

I have only pasted output of the first 2 verbs. As you can see the sort order is changed and is not the same. I have gone through the script and cannot detect which part what modifies the sort order. Is it because the files are in UTF8. I need this format to handle complex scripts like Devanagari or Arabic.
I desperately need the sort order in the suffix file to be retained.
If it is not too much trouble could you please comment the part of the script which modifies the sort order of the output.
Many thanks for your kind help
# 6  
Old 01-17-2018
In awk the loop for(var in array) produces output in a random order. If the output order is important, you need to use integer indices and save the values in the array (instead of just saving the values in the array indices). For example:
Code:
awk '
FNR == NR {			# While reading the 1st file...
	suf[++c] = $0		# gather and count suffices.
	next
}
{				# While reading the 2nd file...
	print			# print verb by itself and...
	for(i = 1; i <= c; i++)
		print $0 suf[i]	# print verb with suffices in order.
}' Inflections.dic Verbs.dic

with the sample files you provided in post #1 in this thread, produces the output:
Code:
walk
walks
walking
walked
talk
talks
talking
talked
seat
seats
seating
seated
pick
picks
picking
picked
laugh
laughs
laughing
laughed

Does this help?
# 7  
Old 01-17-2018
Thanks a lot, especially for the code and the precious comments. I always assumed that awk respected the order in the file and did not disturb the same.
You made my day.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Programming

creating separate output file for each input file in python

Experts, Need your help for this. Please support My motive is to create seperate output file for each Input Files(File 1 and File2) in another folder say(/tmp/finaloutput) Input files File 1(1.1.1.1.csv) a,b,c 43,17104773,3 45,17104234,4 File 2(2.2.2.2.csv) a,b,c 43,17104773,1... (2 Replies)
Discussion started by: as7951
2 Replies

2. Shell Programming and Scripting

Creating lemmatised forms by concatenating two files

Dear all, I am working on a noun, adjectiveand verb lemmatiser for Sindhi which will eventually be put up as open source for generic use. The tool will take a word and provide all possible forms of the word. To achieve this I have identified the root forms and the eventual suffixes which could... (3 Replies)
Discussion started by: gimley
3 Replies

3. UNIX Desktop Questions & Answers

How can I replicate master master and master master MySQL databse replication and HA?

I have an application desigend in PHP and MySQl running on apache web server that I is running on a Amazon EC2 server Centos. I want to implement the master-master and master slave replication and high availability disaster recovery on this application database. For this I have created two... (0 Replies)
Discussion started by: Palak Sharma
0 Replies

4. Shell Programming and Scripting

Concatenating 3 files into a single file

I have 3 files File1 C1 C2 c3 File 2 C1 c2 c3 File 3 C1 c2 c3 Now i want to have File1 as C1 c2 c3 I File2 as C1 c2 c3 O File3 as c1 c2 c3 D and these 3 files should be concatenated into a single file how can it be done in unix script? (3 Replies)
Discussion started by: Codesearcher
3 Replies

5. UNIX for Dummies Questions & Answers

Creating a file where the owner and group is not root

Hi, I'm the root user on my computer, but I'm writing a script that does a lot of file handling. Every time I create a file or directory it automatically requires root privileges. Is there a way I can just create a file that the user can access without a password? For example in my script I... (20 Replies)
Discussion started by: jdilts
20 Replies

6. UNIX for Dummies Questions & Answers

creating separate directories according to file extension and keeping file in different directory as

unix program to which a directory name will be passed as parameter. This directory will contain files with various extensions. This script will create directories with the names of the extention of the files and then put the files in the corresponding folder. All files which do not have any... (2 Replies)
Discussion started by: Deekay.p
2 Replies

7. Shell Programming and Scripting

Concatenating lines of separate files using awk or sed

For example: File 1: abc def ghi jkl mno pqr File 2: stu vwx yza bcd efg hij klm nop qrs I want the reult to be: abc def ghistu vwx yza jkl mno pqrbcd efg hij klm nop qrs (4 Replies)
Discussion started by: tamahomekarasu
4 Replies

8. Shell Programming and Scripting

Concatenating Files In A Year/Month/Day File Structure

Hi Im trying to concatenate a specific file from each day in a year/month/day folder structure using Bash or equivalent. The file structure ends up like this: 2009/01/01/products 2009/01/02/products .... 2009/12/31/products The file I need is in products everyday and I need the script to... (3 Replies)
Discussion started by: Grizzly
3 Replies

9. Shell Programming and Scripting

Break a file into separate files

Hello I am facing a scenario where I have a file with XML content and I am running shell script over it. But the problem is the XML is getting updated with new services. In the below scenario, my script takes values from the xml file from one service name say ABCD. Since there are multiple, it is... (8 Replies)
Discussion started by: chiru_h
8 Replies
Login or Register to Ask a Question