Hello,
I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled.
The syllable file which has syllables in Hindi has one syllable per line
and the corpus file has a data structure where the word is given in English and its Hindi equivalent is provided, with EQUAL TO (=) as a delimiter
What I tried to get is a structure where each syllable is given and a corresponding example from the corpus file is provided.
Basically it implies a concordance of syllables: I tried to grep from file and get the results but the data I get is too voluminous and pretty slow.
I would really appreciate if a script in AWK or PERL could do the job.
I work in Windows under DOS so the facility of piping is denied to me under AWK.
A pseudo-data file (in English is provided as a zip)
Many thanks in advance for the help.
Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am... (3 Replies)
Hello,
I am a relative newbie and want to split Names in English into syllables. Does anyone know of a perl script which does that. Since my main area is linguistics, I would be happy to add rules to it and post the perl script back for other users. I tried the CPan perl modules but they don't... (6 Replies)
I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
I have found this syllable splitter in awk. The code is given below. Basically the script cuts words and names into syllables. However it fails when the word contains 2 consonants which constitute a single syllable. An example is given below
ashford
raphael
The output is as under:
... (4 Replies)
Hello,
I have written a syllable splitter for Pseudo English and Indic.
I have a large database with the following structure
Syllables in Pseudo English delimited by |=Syllables in Devanagari delimited by |
The tool produces syllables in both scripts. An example is given below:
... (2 Replies)
Discussion started by: gimley
2 Replies
LEARN ABOUT MOJAVE
english5.18
English(3pm) Perl Programmers Reference Guide English(3pm)NAME
English - use nice English (or awk) names for ugly punctuation variables
SYNOPSIS
use English;
use English qw( -no_match_vars ) ; # Avoids regex performance penalty
# in perl 5.16 and earlier
...
if ($ERRNO =~ /denied/) { ... }
DESCRIPTION
This module provides aliases for the built-in variables whose names no one seems to like to read. Variables with side-effects which get
triggered just by accessing them (like $0) will still be affected.
For those variables that have an awk version, both long and short English alternatives are provided. For example, the $/ variable can be
referred to either $RS or $INPUT_RECORD_SEPARATOR if you are using the English module.
See perlvar for a complete list of these.
PERFORMANCE
NOTE: This was fixed in perl 5.20. Mentioning these three variables no longer makes a speed difference. This section still applies if
your code is to run on perl 5.18 or earlier.
This module can provoke sizeable inefficiencies for regular expressions, due to unfortunate implementation details. If performance matters
in your application and you don't need $PREMATCH, $MATCH, or $POSTMATCH, try doing
use English qw( -no_match_vars ) ;
. It is especially important to do this in modules to avoid penalizing all applications which use them.
perl v5.18.2 2014-01-06 English(3pm)