Sponsored Content
Top Forums Shell Programming and Scripting CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post 302543169 by gimley on Friday 29th of July 2011 08:21:12 PM
Old 07-29-2011
CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am really thankful:

Code:
#! /usr/bin/perl

use strict;   # These two lines save you endless trouble 
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
$/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>));  # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $found = 0;
    for my $word (@corpus){
        if ( $word =~ /$syllable/){  # use a regular expression to find a match for the syllable
            print "$syllable=$word\n";
            $found = 1;
            last; #Stop processing the array of words as we have an example
        }
    }
    print "$syllable wasn't matched in the supplied corpus\n" if (! $found);
}

However I need one more refinement
I need to modify the program such that it finds the syllable in three different environents Initial medial Final Standalone(whole word)
example (theoretical: I know somebody will say "a" here is not a syllable. But I am working with Indian languages).
Syllable "a"
Intial Medial Final Standalone
ago bare gonna a
It could be that the syllable may not appear in all environments as in the case of stri
Intial Medial Final Standalone
strip Astrid NONE NONE
I have tried to factor in the environmental constraints using regexes but the results are disastrous
Please help. I have spent quite a few hours and the results get more ludicrous each time.
Many thanks and my gratitutde to the generous people on the forum who give their time and energy to helping out tyros like me.

Last edited by radoulov; 07-30-2011 at 04:07 AM.. Reason: Code tags!
 

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data... (8 Replies)
Discussion started by: gimley
8 Replies

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e... (6 Replies)
Discussion started by: useretail
6 Replies

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
Discussion started by: gimley
9 Replies

4. Shell Programming and Scripting

[All variants] Change settings

Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated. 1. I need to parse all values and output to stdout. Sample output (truncated): VALUEA 2017-01-01 Lores ipsum Lorem ipsum dolor sit... (11 Replies)
Discussion started by: useretail
11 Replies

5. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
MU-EXTRACT(1)                                                 General Commands Manual                                                MU-EXTRACT(1)

NAME
mu_extract - display and save message parts (attachments), and open them with other tools. SYNOPSIS
mu extract [options] <file> mu extract [options] <file> <pattern> DESCRIPTION
mu extact is the mu sub-command for extracting MIME-parts (e.g., attachments) from mail messages. It works on message files, and does not require the message to be indexed in the database. For attachments, the file name used when saving it, is the name of the attachment in the message. If there is no such name, or when saving non-attachment MIME-parts, a name is derived from the message-id of the message. If you specify a pattern (a case-insensitive regular expression) as the second argument, all attachments with filenames matching that pat- tern will be extracted. The regular expressions are Perl-compatible (as per the PCRE-library). Without any options, mu extract simply outputs the list of leaf MIME-parts in the message. Only 'leaf' MIME-parts (including RFC822 attach- ments) are considered, multipart/* etc. are ignored. OPTIONS
-a, --save-attachments save all MIME-parts that look like attachments. --save-all save all non-multipart MIME-parts. --parts=<parts> only consider the following numbered parts (comma-separated list).The numbers for the parts can be seen from running mu extract without any options but only the message file. --target-dir=<dir> save the parts in the target directory rather than the current working directory. --overwrite overwrite existing files with the same name; by default overwriting is not allowed. --play Try to 'play' (open) the attachment with the default application for the particular file type. On MacOS, this uses the open program, on other platforms is uses xdg-open. You can choose a different program by setting the MU_PLAY_PROGRAM environment variable. EXAMPLES
To display information about all the MIME-parts in a message file: $ mu extract msgfile To extract MIME-part 3 and 4 from this message, overwriting existing files with the same name: $ mu extract --parts=3,4 --overwrite msgfile To extract all files ending in '.jpg' (case-insensitive): $ mu extract msgfile '.*.jpg' To extract an mp3-file, and play it in the the default mp3-playing application. $ mu extract --play msgfile 'whoopsididitagain.mp3' BUGS
Please report bugs if you find them: http://code.google.com/p/mu0/issues/list AUTHOR
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl> SEE ALSO
mu(1) User Manuals February 2012 MU-EXTRACT(1)
All times are GMT -4. The time now is 03:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy