CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

07-29-2011

Registered User

303, 4

Join Date: Feb 2011

Last Activity: 3 March 2020, 10:05 PM EST

Posts: 303

Thanks Given: 145

Thanked 4 Times in 4 Posts

CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am really thankful:

Code:

#! /usr/bin/perl

use strict;   # These two lines save you endless trouble 
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
$/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>));  # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $found = 0;
    for my $word (@corpus){
        if ( $word =~ /$syllable/){  # use a regular expression to find a match for the syllable
            print "$syllable=$word\n";
            $found = 1;
            last; #Stop processing the array of words as we have an example
        }
    }
    print "$syllable wasn't matched in the supplied corpus\n" if (! $found);
}

However I need one more refinement
I need to modify the program such that it finds the syllable in three different environents Initial medial Final Standalone(whole word)
example (theoretical: I know somebody will say "a" here is not a syllable. But I am working with Indian languages).
Syllable "a"
Intial Medial Final Standalone
ago bare gonna a
It could be that the syllable may not appear in all environments as in the case of stri
Intial Medial Final Standalone
strip Astrid NONE NONE
I have tried to factor in the environmental constraints using regexes but the results are disastrous
Please help. I have spent quite a few hours and the results get more ludicrous each time.
Many thanks and my gratitutde to the generous people on the forum who give their time and energy to helping out tyros like me.

Last edited by radoulov; 07-30-2011 at 04:07 AM.. Reason: Code tags!

gimley

View Public Profile for gimley

Find all posts by gimley

08-01-2011

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Well, regex for white space vary: Regex Tutorial - \b Word Boundaries

I used to say \< and \> for word boundary, but the PERL guys got to the POSIX and changed it after decades, so both may be \b!

So, you need to check for

standalone \<a\>
initial \<a[a-z]
final [a-z]a\>
medial [a-z]a[a-z]

but since the [a-z] check is more expensive, you might be able to check in this order, since if not \<a\> then \<a is initial and a\> is final, and medial is none of the above.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

08-01-2011

Registered User

303, 4

Join Date: Feb 2011

Last Activity: 3 March 2020, 10:05 PM EST

Posts: 303

Thanks Given: 145

Thanked 4 Times in 4 Posts

Hello,
With a little help from colleagues, I finally managed to get the concordance going. Here is the code in case someone else would like to use it:

Code:

#! /usr/bin/perl

use strict;  # These two lines save you endless trouble
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
# $/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>)); # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $count = 0;
    my $init = my $med = my $fin = my $stdalone = "NONE";
    for my $word (@corpus) {
        if ( $word =~ /^$syllable.+/) {
            if ($init eq "NONE") {
                $init = $word;
                $count++;
            }
        }
        elsif ($word =~ /.+$syllable.+/) {
            if ($med eq "NONE") {
                $med = $word;
                $count++;
            }
        }
        elsif ($word =~ /.+$syllable$/) {
            if ($fin eq "NONE") {
                $fin = $word;
                $count++;
            }
        }
        elsif ($word =~ /^$syllable$/) {
            if ($stdalone eq "NONE") {
                $stdalone = $word;
                $count++;
            }
        }
        last if $count == 4;
    }
    print "$syllable\nInitial $init\nMedial $med\nFinal $fin\nStandalone $stdalone\n";
    #print "$init\t$med\t$fin\t$stdalone\n";
}

Many thanks for the information re. Regex.

Last edited by Scott; 08-02-2011 at 01:14 AM.. Reason: Code tags, please...

gimley

View Public Profile for gimley

Find all posts by gimley

08-02-2011

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

A logic tree and removing redundant tests save time. If it has a prefix char, it is medial or final else it is initial or standalone, and for prefix'ed, if not medial it is always final, no test needed.

This User Gave Thanks to DGPickett For This Post:

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

Shell Programming and Scripting

CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

5 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Discussion started by: fondan

2. Shell Programming and Scripting

[All variants] Change settings

Discussion started by: useretail

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

Discussion started by: gimley

4. Shell Programming and Scripting

[All variants] remove first pair of parentheses

Discussion started by: useretail

5. Shell Programming and Scripting

Creating a syllable concordance

Discussion started by: gimley