Creating a syllable concordance Post: 302522237

Sponsored Content

Top Forums Shell Programming and Scripting Creating a syllable concordance Post 302522237 by gimley on Friday 13th of May 2011 11:16:22 PM

05-14-2011

Registered User

Creating a syllable concordance

Hello,
I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled.
The syllable file which has syllables in Hindi has one syllable per line
and the corpus file has a data structure where the word is given in English and its Hindi equivalent is provided, with EQUAL TO (=) as a delimiter
What I tried to get is a structure where each syllable is given and a corresponding example from the corpus file is provided.
Basically it implies a concordance of syllables: I tried to grep from file and get the results but the data I get is too voluminous and pretty slow.
I would really appreciate if a script in AWK or PERL could do the job.
I work in Windows under DOS so the facility of piping is denied to me under AWK.
A pseudo-data file (in English is provided as a zip)
Many thanks in advance for the help.

Data.zip (254 Bytes)

gimley

View Public Profile for gimley

Find all posts by gimley

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello, Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was The following script was provided which did the job and for which I am...

2. Shell Programming and Scripting

Syllable splitter in Perl

Hello, I am a relative newbie and want to split Names in English into syllables. Does anyone know of a perl script which does that. Since my main area is linguistics, I would be happy to add rules to it and post the perl script back for other users. I tried the CPan perl modules but they don't...

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the...

4. Shell Programming and Scripting

Modifying an awk script for syllable splitting

I have found this syllable splitter in awk. The code is given below. Basically the script cuts words and names into syllables. However it fails when the word contains 2 consonants which constitute a single syllable. An example is given below ashford raphael The output is as under: ...

5. Shell Programming and Scripting

Find Syllable count mismatch

Hello, I have written a syllable splitter for Pseudo English and Indic. I have a large database with the following structure Syllables in Pseudo English delimited by |=Syllables in Devanagari delimited by | The tool produces syllables in both scripts. An example is given below: ...

LEARN ABOUT MOJAVE

english5.18

English(3pm)						 Perl Programmers Reference Guide					      English(3pm)

NAME

       English - use nice English (or awk) names for ugly punctuation variables

SYNOPSIS

	   use English;
	   use English qw( -no_match_vars ) ;  # Avoids regex performance penalty
					       # in perl 5.16 and earlier
	   ...
	   if ($ERRNO =~ /denied/) { ... }

DESCRIPTION

       This module provides aliases for the built-in variables whose names no one seems to like to read.  Variables with side-effects which get
       triggered just by accessing them (like $0) will still be affected.

       For those variables that have an awk version, both long and short English alternatives are provided.  For example, the $/ variable can be
       referred to either $RS or $INPUT_RECORD_SEPARATOR if you are using the English module.

       See perlvar for a complete list of these.

PERFORMANCE

       NOTE: This was fixed in perl 5.20.  Mentioning these three variables no longer makes a speed difference.  This section still applies if
       your code is to run on perl 5.18 or earlier.

       This module can provoke sizeable inefficiencies for regular expressions, due to unfortunate implementation details.  If performance matters
       in your application and you don't need $PREMATCH, $MATCH, or $POSTMATCH, try doing

	  use English qw( -no_match_vars ) ;

       .  It is especially important to do this in modules to avoid penalizing all applications which use them.

perl v5.18.2							    2014-01-06							      English(3pm)

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Discussion started by: gimley

2. Shell Programming and Scripting

Syllable splitter in Perl

Discussion started by: gimley

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

Discussion started by: gimley

4. Shell Programming and Scripting

Modifying an awk script for syllable splitting

Discussion started by: gimley

5. Shell Programming and Scripting

Find Syllable count mismatch

Discussion started by: gimley

LEARN ABOUT MOJAVE

english5.18