Creating a master file of conjugated verbs by concatenating root and inflection from separate files
Excuses for the long descriptive title.
I am working with Sindhi and developing a database of all verbal conjugations in that language.
I have generated 2 files:
Verbs.dic contains all the verbs, one verb per line
Inflections.dic contains the verbal conjugations which need to be appended to each verb.
An example will make this clear. I am choosing English for clarity and have chosen a very simple set, given the complexity of English verbs.
The input files are as under
Verbs.dic
Inflections.dic
What I need is a Perl or Awk script which will take the list of inflections from inflections .dic and append each of the inflections to each verb in the list in Verbs.dic. The resultant output would be as under:
Output
In English the list of inflections is pretty limited, in Sindhi the number of inflections range from 35-40 and generating them out manually is impossible. Please note: I work unfortunately under a windows environment
All good wishes for the New Year and many thanks in advance
Hello I am facing a scenario where I have a file with XML content and I am running shell script over it. But the problem is the XML is getting updated with new services. In the below scenario, my script takes values from the xml file from one service name say ABCD. Since there are multiple, it is... (8 Replies)
Hi Im trying to concatenate a specific file from each day in a year/month/day folder structure using Bash or equivalent. The file structure ends up like this:
2009/01/01/products
2009/01/02/products
....
2009/12/31/products
The file I need is in products everyday and I need the script to... (3 Replies)
For example:
File 1:
abc def ghi
jkl mno pqr
File 2:
stu vwx yza
bcd efg hij
klm nop qrs
I want the reult to be:
abc def ghistu vwx yza
jkl mno pqrbcd efg hij
klm nop qrs (4 Replies)
unix program to which a directory name will be passed as
parameter. This directory will contain files with various
extensions. This script will create directories with the names of the
extention of the files and then put the files in the
corresponding folder. All files which do not have any... (2 Replies)
Hi,
I'm the root user on my computer, but I'm writing a script that does a lot of file handling. Every time I create a file or directory it automatically requires root privileges. Is there a way I can just create a file that the user can access without a password?
For example in my script I... (20 Replies)
I have 3 files
File1
C1 C2 c3
File 2
C1 c2 c3
File 3
C1 c2 c3
Now i want to have
File1 as C1 c2 c3 I
File2 as C1 c2 c3 O
File3 as c1 c2 c3 D
and these 3 files should be concatenated into a single file
how can it be done in unix script? (3 Replies)
I have an application desigend in PHP and MySQl running on apache web server that I is running on a Amazon EC2 server Centos. I want to implement the master-master and master slave replication and high availability disaster recovery on this application database.
For this I have created two... (0 Replies)
Dear all,
I am working on a noun, adjectiveand verb lemmatiser for Sindhi which will eventually be put up as open source for generic use. The tool will take a word and provide all possible forms of the word.
To achieve this I have identified the root forms and the eventual suffixes which could... (3 Replies)
Experts,
Need your help for this. Please support
My motive is to create seperate output file for each Input Files(File 1 and File2) in another folder say(/tmp/finaloutput)
Input files
File 1(1.1.1.1.csv)
a,b,c
43,17104773,3
45,17104234,4
File 2(2.2.2.2.csv)
a,b,c
43,17104773,1... (2 Replies)
Discussion started by: as7951
2 Replies
LEARN ABOUT SUNOS
atok12wordlist
atok12wordlist(4) File Formats atok12wordlist(4)NAME
atok12wordlist - Text word file for ATOK12 dictionary utility
DESCRIPTION
atok12wordlist is a word file of text format for ATOK12 dictionary maintenance. It is used by some functions of the atok12(1) dictionary
utilitiy for input and output.
The format of the word file is defined as follows:
First line of the file The first line must begin with:
!ATOK12
In this case, ! must be a half-size character. ATOK12 can be half-size characters and corresponding full-size
characters.
Comment line Lines begin with ! (Both half-size and full-size can be used) are ignored as a commented out line, except for the
first line.
Specifying words using aWords_aresspecified by the following notation.
Reading,notation,part_of_speech
Either a comma or touten (Japanese comma) can be used as a delimiter. When notation contains a delimiter, enclose
the notation in double- or single-quotation marks. Either a half-size characters or a full-size characters can be
used for delimiters and quotation marks. For the part_of_speech type, see the description of "Possible
part_of_speech". The part_of_speech entry must be in Japanese.
Specifying a word with aWords_cansalsohbeuspecified by the following notation.
Reading,notation,part_of_speech_number
This format is the same as above, except that the part_of_speech is specified by a number. For the
part_of_speech_number, see the description of "Possible part_of_speech".
Possible reading
Length: Up to 16 characters. Dakuten (sonant sound mark) and han-dakuten (p-sound mark) are counted as a single
character.
CharacterThe following half-size and full-size characters can be used. However, when stored in the dictionary, the
characters for the reading are converted to the corresponding half-size characters.
o Full-size Hiragana
o The following half-size and full-size characters
Katakana
Alphabet
Numerals
Dakuten
Han-dakuten
-
+
*
/
_
#
$
%
&
=
@
:
;
<
>
The following characters cannot be used as the first character of the reading.
o The following half-size and full-size characters
Hiragana/Katakana "wo"
Hiragana/Katakana "n"
Chouon (prolonged sound mark)
Hiragana/Katakana Youon small "a", "i",
"u", "e", "o", "ya", "yu", "yo"
Hiragana/Katakana Sokuon "small tsu"
Dakuten (sonant sound mark)
Han-dakuten (p-sound mark)
Order of Inathegdictionary, the reading is stored after conversion to the corresponding half-size characters. The
order of the reading is same as the order of the JIS-X0201 character set definition code. When list dis-
play or reading range specification is made using the dictionary maintenance tool, this order is used.
Possible notation
Length: Up to 50 characters. A half-size character is counted as 1 and a full-size character 2.
CharacterAll half-size and full-size characters
Possible part_of_speech There are a total of 33 part_of_speech types. The part_of_speech entries must be in Japanese.
+-------------------------------------------------------------+
|Part_of_speech_number Part_of_speech |
|1 General nouns |
|4 Proper nouns (person name) |
|5 Proper nouns (location name) |
|6 Proper nouns (organization name) |
|8 Proper nouns (general) |
|9 Nouns (sa-gyou irregular) |
|10 Nouns (za-gyou irregular) |
|11 Nouns (adjective verbs) |
|13 Numerals |
|14 Aeverbs |
|15 Rentaishi (non-conjugative adjec- |
| tives) |
|16 Conjunctions |
|17 Interjections |
|18 Independent words |
|19 Prefixes |
|21 Noun suffixes |
|23 Ka-gyou godan katsuyou (consonant- |
| stem) verbs |
|24 Ga-gyou godan katsuyou (consonant- |
| stem) verbs |
|25 Sa-gyou godan katsuyou (consonant- |
| stem) verbs |
|26 Ta-gyou godan katsuyou (consonant- |
| stem) verbs |
|27 Na-gyou godan katsuyou (consonant- |
| stem) verbs |
|32 Ha-gyou godan katsuyou (consonant- |
| stem) verbs |
|28 Ba-gyou godan katsuyou (consonant- |
| stem) verbs |
|29 Ma-gyou godan katsuyou (consonant- |
| stem) verbs |
|30 Ra-gyou godan katsuyou (consonant- |
| stem) verbs |
|31 Wa-gyou godan katsuyou (consonant- |
| stem) verbs |
|33 Ichidan katsuyou verbs |
|34 Ka-gyou irregular verbs |
|35 Sa-gyou irregular verbs |
|36 Za-gyou irregular verbs |
|37 Adjectives |
|39 Adjective verbs |
|41 Tankanji (single Kanji) |
+-------------------------------------------------------------+
USAGE
See Japanese locale man pages for examples of word file entries.
SEE ALSO atok12(1)
ATOK12 User's Guide
SunOS 5.10 10 Jan 2003 atok12wordlist(4)