Sponsored Content
Top Forums Shell Programming and Scripting CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post 302543169 by gimley on Friday 29th of July 2011 08:21:12 PM
Old 07-29-2011
CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am really thankful:

Code:
#! /usr/bin/perl

use strict;   # These two lines save you endless trouble 
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
$/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>));  # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $found = 0;
    for my $word (@corpus){
        if ( $word =~ /$syllable/){  # use a regular expression to find a match for the syllable
            print "$syllable=$word\n";
            $found = 1;
            last; #Stop processing the array of words as we have an example
        }
    }
    print "$syllable wasn't matched in the supplied corpus\n" if (! $found);
}

However I need one more refinement
I need to modify the program such that it finds the syllable in three different environents Initial medial Final Standalone(whole word)
example (theoretical: I know somebody will say "a" here is not a syllable. But I am working with Indian languages).
Syllable "a"
Intial Medial Final Standalone
ago bare gonna a
It could be that the syllable may not appear in all environments as in the case of stri
Intial Medial Final Standalone
strip Astrid NONE NONE
I have tried to factor in the environmental constraints using regexes but the results are disastrous
Please help. I have spent quite a few hours and the results get more ludicrous each time.
Many thanks and my gratitutde to the generous people on the forum who give their time and energy to helping out tyros like me.

Last edited by radoulov; 07-30-2011 at 04:07 AM.. Reason: Code tags!
 

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data... (8 Replies)
Discussion started by: gimley
8 Replies

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e... (6 Replies)
Discussion started by: useretail
6 Replies

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
Discussion started by: gimley
9 Replies

4. Shell Programming and Scripting

[All variants] Change settings

Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated. 1. I need to parse all values and output to stdout. Sample output (truncated): VALUEA 2017-01-01 Lores ipsum Lorem ipsum dolor sit... (11 Replies)
Discussion started by: useretail
11 Replies

5. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
Coy(3pm)						User Contributed Perl Documentation						  Coy(3pm)

NAME
Coy - like Carp only prettier SYNOPSIS
# In your application: # ==================== use Coy; warn "There seems to be a problem"; die "Looks like it might be fatal"; # You can add vocab in the $HOME/.coyrc file: # =========================================== noun RESET; # REMOVE EXISTING noun VOCAB # WORKS FOR OTHER SPECIFIERS TOO noun { wookie => { category => [ Sentient ], sound => [ "roars", "grunts", "bellows" ], act => { sits => { location => Arborial }, fights => { minimum => 2, association => "argument", }, }, }, }; category { Sentient => { act => { quarrels => { associations => "argument", location => Terrestrial, minimum => 2, synonyms => [qw(bickers argues)], }, laughs => { associations => "happy", location => Terrestrial, non_adjectival => 1, }, }, } }; personage "R2D2"; personage "Darth Vader"; place "Mos Eisley"; place "the Death Star"; tree "Alderaan mangrove"; fruit_tree "Wookie-oak"; # You can also select a different syllable counter via .coyrc # =========================================================== use Lingua::EN::Syllables::syllable; syllable_counter "Lingua::EN::Syllables::syllable"; # or use Lingua::EN::Syllables::syllable; syllable_counter &Lingua::EN::Syllables::syllable; # or syllable_counter sub { return 1 }; # FAST BUT INACCURATE DESCRIPTION
Error messages strewn across my terminal. A vein starts to throb. Their reproof adds the injury of insult to the shame of failure. When a program dies what you need is a moment of serenity. The Coy.pm module brings tranquillity to your debugging. The module alters the behaviour of C<die> and C<warn> (and C<croak> and C<carp>). It also provides C<transcend> and C<enlighten> -- two Zen alternatives. Like Carp.pm, Coy reports errors from the caller's point-of-view. But it prefaces the bad news of failure with a soothing haiku. The haiku are not "canned", but are generated freshly every time. Once the haiku is complete, it's prepended to the error message. Execution of the original call to C<die> or C<warn> resumes. Haiku and error message strew across my screen. A smile starts to form. EXTENDING THE VOCABULARY
Any code placed in "$ENV{HOME}/.coyrc" runs at compile-time. You can use that file to extend Coy.pm's vocabulary. The "SYNOPSIS" at the start of this POD shows how you might set it up. (Eventually this section will detail the full mechanism.) CHANGING THE SYLLABLE COUNTER
Real haiku often <BR> have imperfect syllable<BR> counts. The deficiencies of<BR> Coy's inbuilt counter are thus<BR> artistic virtues. But some connoisseurs<BR> demand their syllable counts<BR> be always exact. So if you don't like<BR> the syllable counter, Coy<BR> let's you replace it. Coy provides a sub called C<syllable_counter> for that very purpose. It is passed a sub reference. That sub is then used to count syllables. You can also pass the sub's I<name> (that is, pass a symbolic reference). The new counter sub should take a string and return its syllable count. C<syllable_counter> can be called from your code, or from .coyrc. BUGS AND LIMITATIONS
In its current form, the module has four problems and limitations: * Vocabulary: The list of nouns and verbs is too small at present. This limits the range of topics that the haiku produced can cover. That in turn leads to tell-tale repetition (which fails the Turing test). Extending the range of words Coy.pm can use is no problem (though finding the time and the creativity required may be :-). Users of Coy are encouraged to add their own vocabulary. (See the "SYNOPSIS", and also "EXTENDING THE VOCABULARY"). * Associations: The vocabulary has too few topic links. Hence it's often not able to find relevant words for a message. This leads to haiku utterly unrelated to the error text. Again, there is no technical difficulty in adding more links: Defining enough associations isn't hard, just tedious. User-specified vocabularies can solve this problem as well. * Limited grammar: The number of syntactic templates is too small. This leads to haiku that are (structurally, at least) monotonous. Yet again, this needs no technical solution, just time and effort. Of course, such enhanced templates might require richer vocabulary. For example, verb predicates would need extra database structure: Each verb entry would have to be extended with links to object nouns. * Syllable counting: This is perhaps the major problem at present. The algorithmic syllable counter is still being developed. It is currently around 96% accurate (per word). This means that correct syllable counts for haiku can't be guaranteed. Syllable counts for single words are correct to plus-or-minus 1. In a multi-word haiku these errors cancel out in most cases. Thus, the haiku tend to be correct within one or two syllables. As the syllable counter slowly improves, this problem will abate. Alteratively, you can choose to use your own syllable counter. (See above in the section titled "CHANGING THE SYLLABLE COUNTER".) AUTHOR
The Coy.pm module was developed by Damian Conway. COPYRIGHT
Copyright (c) 1998-2000, Damian Conway. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Perl Artistic License (see http://www.perl.com/perl/misc/Artistic.html) perl v5.8.8 2007-07-30 Coy(3pm)
All times are GMT -4. The time now is 02:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy