CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post: 302543169

Sponsored Content

Top Forums Shell Programming and Scripting CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post 302543169 by gimley on Friday 29th of July 2011 08:21:12 PM

07-29-2011

Registered User

CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am really thankful:

Code:

#! /usr/bin/perl

use strict;   # These two lines save you endless trouble 
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
$/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>));  # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $found = 0;
    for my $word (@corpus){
        if ( $word =~ /$syllable/){  # use a regular expression to find a match for the syllable
            print "$syllable=$word\n";
            $found = 1;
            last; #Stop processing the array of words as we have an example
        }
    }
    print "$syllable wasn't matched in the supplied corpus\n" if (! $found);
}

However I need one more refinement
I need to modify the program such that it finds the syllable in three different environents Initial medial Final Standalone(whole word)
example (theoretical: I know somebody will say "a" here is not a syllable. But I am working with Indian languages).
Syllable "a"
Intial Medial Final Standalone
ago bare gonna a
It could be that the syllable may not appear in all environments as in the case of stri
Intial Medial Final Standalone
strip Astrid NONE NONE
I have tried to factor in the environmental constraints using regexes but the results are disastrous
Please help. I have spent quite a few hours and the results get more ludicrous each time.
Many thanks and my gratitutde to the generous people on the forum who give their time and energy to helping out tyros like me.

Last edited by radoulov; 07-30-2011 at 04:07 AM.. Reason: Code tags!

gimley

View Public Profile for gimley

Find all posts by gimley

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data...

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e...

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the...

4. Shell Programming and Scripting

[All variants] Change settings

Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated. 1. I need to parse all values and output to stdout. Sample output (truncated): VALUEA 2017-01-01 Lores ipsum Lorem ipsum dolor sit...

5. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315...

LEARN ABOUT DEBIAN

coy

Coy(3pm)						User Contributed Perl Documentation						  Coy(3pm)

NAME

	   Coy - like Carp only prettier

SYNOPSIS

	   # In your application:
	   # ====================

		   use Coy;

		   warn "There seems to be a problem";

		   die "Looks like it might be fatal";

	   # You can add vocab in the $HOME/.coyrc file:
	   # ===========================================

		   noun RESET; # REMOVE EXISTING noun VOCAB
			       # WORKS FOR OTHER SPECIFIERS TOO

		   noun {
			       wookie =>
			       {
				       category => [ Sentient ],
				       sound	=> [ "roars", "grunts", "bellows" ],
				       act	=>
				       {
					       sits   => { location => Arborial },

					       fights => { minimum => 2,
							   association => "argument",
							 },
				       },
			       },

			};

		   category {
			       Sentient =>
			       {
				       act =>
				       {
					       quarrels =>
					       {
						       associations => "argument",
						       location => Terrestrial,
						       minimum => 2,
						       synonyms => [qw(bickers argues)],
					       },
					       laughs =>
					       {
						       associations => "happy",
						       location => Terrestrial,
						       non_adjectival => 1,
					       },
				       },
			       }
			    };

		   personage "R2D2";
		   personage "Darth Vader";

		   place "Mos Eisley";
		   place "the Death Star";

		   tree "Alderaan mangrove";
		   fruit_tree "Wookie-oak";

	   # You can also select a different syllable counter via .coyrc
	   # ===========================================================

		   use Lingua::EN::Syllables::syllable;
		   syllable_counter  "Lingua::EN::Syllables::syllable";

	   # or

		   use Lingua::EN::Syllables::syllable;
		   syllable_counter  &Lingua::EN::Syllables::syllable;

	   # or

		   syllable_counter  sub { return 1 };	# FAST BUT INACCURATE

DESCRIPTION

	       Error messages
	       strewn across my terminal.
	       A vein starts to throb.

	       Their reproof adds the
	       injury of insult to
	       the shame of failure.

	       When a program dies
	       what you need is a moment
	       of serenity.

	       The Coy.pm
	       module brings tranquillity
	       to your debugging.

	       The module alters
	       the behaviour of C<die> and
	       C<warn> (and C<croak> and C<carp>).

	       It also provides
	       C<transcend> and C<enlighten> -- two
	       Zen alternatives.

	       Like Carp.pm,
	       Coy reports errors from the
	       caller's point-of-view.

	       But it prefaces
	       the bad news of failure with
	       a soothing haiku.

	       The haiku are not
	       "canned", but are generated
	       freshly every time.

	       Once the haiku is
	       complete, it's prepended to
	       the error message.

	       Execution of
	       the original call to
	       C<die> or C<warn> resumes.

	       Haiku and error
	       message strew across my screen.
	       A smile starts to form.

EXTENDING THE VOCABULARY

	       Any code placed in
	       "$ENV{HOME}/.coyrc"
	       runs at compile-time.

	       You can use that file
	       to extend Coy.pm's
	       vocabulary.

	       The "SYNOPSIS" at
	       the start of this POD shows how
	       you might set it up.

	       (Eventually
		this section will detail the
		full mechanism.)

CHANGING THE SYLLABLE COUNTER

	       Real haiku often <BR>
	       have imperfect syllable<BR>
	       counts.

	       The deficiencies of<BR>
	       Coy's inbuilt counter are thus<BR>
	       artistic virtues.

	       But some connoisseurs<BR>
	       demand their syllable counts<BR>
	       be always exact.

	       So if you don't like<BR>
	       the syllable counter, Coy<BR>
	       let's you replace it.

	       Coy provides a sub
	       called C<syllable_counter> for
	       that very purpose.

	       It is passed a sub
	       reference. That sub is then used
	       to count syllables.

	       You can also pass
	       the sub's I<name> (that is, pass a
	       symbolic reference).

	       The new counter sub
	       should take a string and return
	       its syllable count.

	       C<syllable_counter>
	       can be called from your code, or
	       from .coyrc.

BUGS AND LIMITATIONS

	       In its current form,
	       the module has four problems
	       and limitations:

	       * Vocabulary:
		 The list of nouns and verbs is
		 too small at present.

		 This limits the range
		 of topics that the haiku
		 produced can cover.

		 That in turn leads to
		 tell-tale repetition (which
		 fails the Turing test).

		 Extending the range
		 of words Coy.pm can
		 use is no problem

		 (though finding the time
		 and the creativity
		 required may be :-).

		 Users of Coy are
		 encouraged to add their own
		 vocabulary.

		 (See the "SYNOPSIS",
		  and also "EXTENDING THE
		  VOCABULARY").

	       * Associations:
		 The vocabulary has
		 too few topic links.

		 Hence it's often not
		 able to find relevant
		 words for a message.

		 This leads to haiku
		 utterly unrelated
		 to the error text.

		 Again, there is no
		 technical difficulty
		 in adding more links:

		 Defining enough
		 associations isn't
		 hard, just tedious.

		 User-specified
		 vocabularies can solve
		 this problem as well.

	       * Limited grammar:
		 The number of syntactic
		 templates is too small.

		 This leads to haiku
		 that are (structurally, at
		 least) monotonous.

		 Yet again, this needs
		 no technical solution,
		 just time and effort.

		 Of course, such enhanced
		 templates might require richer
		 vocabulary.

		 For example, verb
		 predicates would need extra
		 database structure:

		 Each verb entry would
		 have to be extended with
		 links to object nouns.

	       * Syllable counting:
		 This is perhaps the major
		 problem at present.

		 The algorithmic
		 syllable counter is still
		 being developed.

		 It is currently
		 around 96%
		 accurate (per word).

		 This means that correct
		 syllable counts for haiku
		 can't be guaranteed.

		 Syllable counts for
		 single words are correct to
		 plus-or-minus 1.

		 In a multi-word
		 haiku these errors cancel
		 out in most cases.

		 Thus, the haiku tend
		 to be correct within one
		 or two syllables.

		 As the syllable
		 counter slowly improves, this
		 problem will abate.

		 Alteratively,
		 you can choose to use your own
		 syllable counter.

		 (See above in the
		  section titled "CHANGING THE
		  SYLLABLE COUNTER".)

AUTHOR

	       The Coy.pm
	       module was developed by
	       Damian Conway.

COPYRIGHT

	       Copyright (c) 1998-2000, Damian Conway. All Rights Reserved.
	     This module is free software. It may be used, redistributed
	     and/or modified under the terms of the Perl Artistic License
		  (see http://www.perl.com/perl/misc/Artistic.html)

perl v5.8.8							    2007-07-30								  Coy(3pm)