Creating lemmatised forms by concatenating two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Creating lemmatised forms by concatenating two files
# 1  
Old 10-23-2015
Creating lemmatised forms by concatenating two files

Dear all,
I am working on a noun, adjectiveand verb lemmatiser for Sindhi which will eventually be put up as open source for generic use. The tool will take a word and provide all possible forms of the word.
To achieve this I have identified the root forms and the eventual suffixes which could be added on to the root. This implies two files:
the first called
Code:
root

contains all the root forms and the second called
Code:
Suffs

contains the suffixes that can be added on to them.
An example from English will make this clear:
The file called
Code:
root

contains the following
Code:
snow
row
call
fill
shout

The file called
Code:
Suffs

contains the following
Code:
s
ed
ing

In the desired output each string from the root file is concatenated to the suffixes to generate out the forms and put them on a single file delimited by a comma, as shown below
Code:
snow,snows,snowed,snowing
row,rows,rowed,rowing
call,calls,called,calling
fill,fills,filled,filling
shout,shouts,shouted,shouting

Unlike English Sindhi morphology is complex and a single verb can admit up to thirty forms.
I would appreciate if somebody could supply a script in Perl or Awk which could concatenate the two files and spew out the contents. I work in a Windows Environment.
Many thanks in advance on behalf of the open community.
# 2  
Old 10-23-2015
Any attempt from your side?

---------- Post updated at 11:10 ---------- Previous update was at 11:10 ----------

Howsoever, try
Code:
awk 'FNR==NR {S[$1];next} {printf "%s", $1; for (s in S) printf ",%s%s", $1, s; printf "\n"}' suffs root
snow,snowed,snows,snowing
row,rowed,rows,rowing
call,called,calls,calling
fill,filled,fills,filling
shout,shouted,shouts,shouting

# 3  
Old 10-23-2015
Hello,
Many thanks. I did try in PERL but could not open handles to two files. I have earlier done this operation with a macro, but it was slow and cumbersome when there are a large number of suffixes. When there are two or three suffixes, the macro works just fine.
I will check out how you managed to open 2 file handles and cat the suffixes to the root. I can use the same technique for other similar operations.

---------- Post updated at 06:50 AM ---------- Previous update was at 06:38 AM ----------

Thanks a lot. I checked it on a small file with around 20 roots and 40 suffs and it works just fine
I also studied the code and saw how easy it was to open 2 files and cat the input of one to the input of the other. Thanks for the script but more for the lesson.
Ich wurde Ihnen sagen dass Ich 66 Jahren habe aber ich bin der Meinung dass man is nie sehr alt zu lernen. Vielen dank nocheinmal

Last edited by Don Cragun; 10-23-2015 at 11:56 AM.. Reason: Get rid of duplicated updated.
# 4  
Old 10-23-2015
Gern geschehen; you are welcome. You still can edit the post to remove the double/triple text.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a master file of conjugated verbs by concatenating root and inflection from separate files

Excuses for the long descriptive title. I am working with Sindhi and developing a database of all verbal conjugations in that language. I have generated 2 files: Verbs.dic contains all the verbs, one verb per line Inflections.dic contains the verbal conjugations which need to be appended to... (6 Replies)
Discussion started by: gimley
6 Replies

2. Shell Programming and Scripting

concatenating similar files in a directory

Hi, I am new in unix. I have below requirement: I have two files at the same directory location File1.txt and File2.txt (just an example, real scenario we might have File2 and File3 OR File6 and File7....) File1.txt has : header1 record1 trailer1 File2.txt has: header2 record2... (4 Replies)
Discussion started by: Deepak62828r
4 Replies

3. UNIX for Dummies Questions & Answers

Browse for files through oracle forms placed in unix server

I use a forms application where I develop the forms(Oracle 6i) in windows server and compile(Oracle 9i) the same in Unix server. The whole forms application executables are placed in Unix. My requirement is to upload data from excel sheet to oracle tables using Oracle forms. I have developed the... (0 Replies)
Discussion started by: malinideepa
0 Replies

4. UNIX for Dummies Questions & Answers

Concatenating Text Files

Hi, I have 30 text files on UNIX that I need to concatenate and create one big file. Could anyone provide me with a solution (if one exist)? I need the answer asap (today). Thanks a lot. Denis (5 Replies)
Discussion started by: 222001459
5 Replies

5. Shell Programming and Scripting

Concatenating two files

HI I need to concatenate two files which are having headers. the result file should contain only the header from first file only and the header in second file have to be skipped. file1: name age sriram 23 file2 name age prabu 25 result file should be name age sriram 23 prabu ... (6 Replies)
Discussion started by: Sriramprabu
6 Replies

6. Shell Programming and Scripting

negatively concatenating files

That subject might sound weird. What I have is two files. One looks like: rf17 pts/59 Jul 10 08:43 (10.72.11.22) 27718 pts/59 0:00 b rf17 pts/3 Jul 10 10:03 (10.72.11.22) 32278 pts/3 1:43 b rf58 pts/29 Jul 10 10:09 (10.72.11.51) 44220 pts/29 0:06 b rf58 pts/61 Jul 10 08:45 (10.72.11.51)... (2 Replies)
Discussion started by: raidzero
2 Replies

7. UNIX for Dummies Questions & Answers

Concatenating records from 2 files

I'm trying to concatenate records from 2 files and output it to a third file. The problem I'm running into is that it seems like the "While" command is limited to processing one file at a time. It seems like you could read a record from file1 into a variable. Then do the same for the for file2.... (4 Replies)
Discussion started by: Powcmptr
4 Replies

8. UNIX for Dummies Questions & Answers

concatenating x files into a one...

... i have 4 files to concatenate but in a certain order and i wanted to do it in a shorter one line command , if possible ! 4 files : file , file0 , file1 and file2 file1 into file2 file0 into the result file into the result thanks in advance Christian (1 Reply)
Discussion started by: Nicol
1 Replies
Login or Register to Ask a Question