Creating lemmatised forms by concatenating two files
Dear all,
I am working on a noun, adjectiveand verb lemmatiser for Sindhi which will eventually be put up as open source for generic use. The tool will take a word and provide all possible forms of the word.
To achieve this I have identified the root forms and the eventual suffixes which could be added on to the root. This implies two files:
the first called
contains all the root forms and the second called
contains the suffixes that can be added on to them.
An example from English will make this clear:
The file called
contains the following
The file called
contains the following
In the desired output each string from the root file is concatenated to the suffixes to generate out the forms and put them on a single file delimited by a comma, as shown below
Unlike English Sindhi morphology is complex and a single verb can admit up to thirty forms.
I would appreciate if somebody could supply a script in Perl or Awk which could concatenate the two files and spew out the contents. I work in a Windows Environment.
Many thanks in advance on behalf of the open community.
Hello,
Many thanks. I did try in PERL but could not open handles to two files. I have earlier done this operation with a macro, but it was slow and cumbersome when there are a large number of suffixes. When there are two or three suffixes, the macro works just fine.
I will check out how you managed to open 2 file handles and cat the suffixes to the root. I can use the same technique for other similar operations.
---------- Post updated at 06:50 AM ---------- Previous update was at 06:38 AM ----------
Thanks a lot. I checked it on a small file with around 20 roots and 40 suffs and it works just fine
I also studied the code and saw how easy it was to open 2 files and cat the input of one to the input of the other. Thanks for the script but more for the lesson.
Ich wurde Ihnen sagen dass Ich 66 Jahren habe aber ich bin der Meinung dass man is nie sehr alt zu lernen. Vielen dank nocheinmal
Last edited by Don Cragun; 10-23-2015 at 11:56 AM..
Reason: Get rid of duplicated updated.
Excuses for the long descriptive title.
I am working with Sindhi and developing a database of all verbal conjugations in that language.
I have generated 2 files:
Verbs.dic contains all the verbs, one verb per line
Inflections.dic contains the verbal conjugations which need to be appended to... (6 Replies)
Hi,
I am new in unix.
I have below requirement:
I have two files at the same directory location
File1.txt and File2.txt (just an example, real scenario we might have File2 and File3 OR File6 and File7....)
File1.txt has :
header1
record1
trailer1
File2.txt has:
header2
record2... (4 Replies)
I use a forms application where I develop the forms(Oracle 6i) in windows server and compile(Oracle 9i) the same in Unix server.
The whole forms application executables are placed in Unix.
My requirement is to upload data from excel sheet to oracle tables using Oracle forms. I have developed the... (0 Replies)
Hi,
I have 30 text files on UNIX that I need to concatenate and create one big file. Could anyone provide me with a solution (if one exist)? I need the answer asap (today). Thanks a lot.
Denis (5 Replies)
HI
I need to concatenate two files which are having headers. the result file should contain only the header from first file only and the header in second file have to be skipped.
file1:
name age
sriram 23
file2
name age
prabu 25
result file should be
name age
sriram 23
prabu ... (6 Replies)
I'm trying to concatenate records from 2 files and output it to a third file. The problem I'm running into is that it seems like the "While" command is limited to processing one file at a time. It seems like you could read a record from file1 into a variable. Then do the same for the for file2.... (4 Replies)
...
i have 4 files to concatenate but in a certain order and i wanted to do it in a shorter one line command , if possible !
4 files : file , file0 , file1 and file2
file1 into file2
file0 into the result
file into the result
thanks in advance
Christian (1 Reply)