CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post: 302543707

Sponsored Content

Top Forums Shell Programming and Scripting CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post 302543707 by gimley on Monday 1st of August 2011 10:46:36 PM

08-01-2011

Registered User

Hello,
With a little help from colleagues, I finally managed to get the concordance going. Here is the code in case someone else would like to use it:

Code:

#! /usr/bin/perl

use strict;  # These two lines save you endless trouble
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
# $/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>)); # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $count = 0;
    my $init = my $med = my $fin = my $stdalone = "NONE";
    for my $word (@corpus) {
        if ( $word =~ /^$syllable.+/) {
            if ($init eq "NONE") {
                $init = $word;
                $count++;
            }
        }
        elsif ($word =~ /.+$syllable.+/) {
            if ($med eq "NONE") {
                $med = $word;
                $count++;
            }
        }
        elsif ($word =~ /.+$syllable$/) {
            if ($fin eq "NONE") {
                $fin = $word;
                $count++;
            }
        }
        elsif ($word =~ /^$syllable$/) {
            if ($stdalone eq "NONE") {
                $stdalone = $word;
                $count++;
            }
        }
        last if $count == 4;
    }
    print "$syllable\nInitial $init\nMedial $med\nFinal $fin\nStandalone $stdalone\n";
    #print "$init\t$med\t$fin\t$stdalone\n";
}

Many thanks for the information re. Regex.

Last edited by Scott; 08-02-2011 at 01:14 AM.. Reason: Code tags, please...

gimley

View Public Profile for gimley

Find all posts by gimley

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data...

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e...

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the...

4. Shell Programming and Scripting

[All variants] Change settings

Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated. 1. I need to parse all values and output to stdout. Sample output (truncated): VALUEA 2017-01-01 Lores ipsum Lorem ipsum dolor sit...

5. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315...

LEARN ABOUT DEBIAN

marc::batch

MARC::Batch(3pm)					User Contributed Perl Documentation					  MARC::Batch(3pm)

NAME

       MARC::Batch - Perl module for handling files of MARC::Record objects

SYNOPSIS

       MARC::Batch hides all the file handling of files of "MARC::Record"s.  "MARC::Record" still does the file I/O, but "MARC::Batch" handles the
       multiple-file aspects.

	   use MARC::Batch;

	   # If you have werid control fields...
	   use MARC::Field;
	   MARC::Field->allow_controlfield_tags('FMT', 'LDX');

	   my $batch = MARC::Batch->new( 'USMARC', @files );
	   while ( my $marc = $batch->next ) {
	       print $marc->subfield(245,"a"), "
";
	   }

EXPORT

       None.  Everything is a class method.

METHODS

   new( $type, @files )
       Create a "MARC::Batch" object that will process @files.

       $type must be either "USMARC" or "MicroLIF".  If you want to specify "MARC::File::USMARC" or "MARC::File::MicroLIF", that's OK, too.
       "new()" returns a new MARC::Batch object.

       @files can be a list of filenames:

	   my $batch = MARC::Batch->new( 'USMARC', 'file1.marc', 'file2.marc' );

       Your @files may also contain filehandles. So if you've got a large file that's gzipped you can open a pipe to gzip and pass it in:

	   my $fh = IO::File->new( 'gunzip -c marc.dat.gz |' );
	   my $batch = MARC::Batch->new( 'USMARC', $fh );

       And you can mix and match if you really want to:

	   my $batch = MARC::Batch->new( 'USMARC', $fh, 'file1.marc' );

   next()
       Read the next record from that batch, and return it as a MARC::Record object.  If the current file is at EOF, close it and open the next
       one. "next()" will return "undef" when there is no more data to be read from any batch files.

       By default, "next()" also will return "undef" if an error is encountered while reading from the batch. If not checked for this can cause
       your iteration to terminate prematurely. To alter this behavior, see "strict_off()". You can retrieve warning messages using the
       "warnings()" method.

       Optionally you can pass in a filter function as a subroutine reference if you are only interested in particular fields from the record.
       This can boost performance.

   strict_off()
       If you would like "MARC::Batch" to continue after it has encountered what it believes to be bad MARC data then use this method to turn
       strict OFF.  A call to "strict_off()" always returns true(1).

       "strict_off()" can be handy when you don't care about the quality of your MARC data, and just want to plow through it. For safety,
       "MARC::Batch" strict is ON by default.

   strict_on()
       The opposite of "strict_off()", and the default state. You shouldn't have to use this method unless you've previously used "strict_off()",
       and want it back on again.  When strict is ON calls to next() will return undef when an error is encountered while reading MARC data.
       strict_on() always returns true(1).

   warnings()
       Returns a list of warnings that have accumulated while processing a particular batch file. As a side effect the warning buffer will be
       cleared.

	   my @warnings = $batch->warnings();

       This method is also used internally to set warnings, so you probably don't want to be passing in anything as this will set warnings on your
       batch object.

       "warnings()" will return the empty list when there are no warnings.

   warnings_off()
       Turns off the default behavior of printing warnings to STDERR. However, even with warnings off the messages can still be retrieved using
       the warnings() method if you wish to check for them.

       "warnings_off()" always returns true(1).

   warnings_on()
       Turns on warnings so that diagnostic information is printed to STDERR. This is on by default so you shouldn't have to use it unless you've
       previously turned off warnings using warnings_off().

       warnings_on() always returns true(1).

   filename()
       Returns the currently open filename or "undef" if there is not currently a file open on this batch object.

RELATED MODULES

       MARC::Record, MARC::Lint

TODO

       None yet.  Send me your ideas and needs.

LICENSE

       This code may be distributed under the same terms as Perl itself.

       Please note that these modules are not products of or supported by the employers of the various contributors to the code.

AUTHOR

       Andy Lester, "<andy@petdance.com>"

perl v5.10.1							    2010-03-29							  MARC::Batch(3pm)