Sponsored Content
Top Forums Shell Programming and Scripting CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post 302543169 by gimley on Friday 29th of July 2011 08:21:12 PM
Old 07-29-2011
CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am really thankful:

Code:
#! /usr/bin/perl

use strict;   # These two lines save you endless trouble 
use warnings; # without them typos and such errors get missed

open (my $corpus_file, '<', 'Corpus'); # Created a test corpus with just the contained lines
$/="\r\n"; # Again with the DOS files
chomp(my @corpus = (<$corpus_file>));  # Load the corpus file into an array for faster access
open (my $syllables_file, '<', 'Syllables');
while(<$syllables_file>){
    chomp(my $syllable = $_);
    my $found = 0;
    for my $word (@corpus){
        if ( $word =~ /$syllable/){  # use a regular expression to find a match for the syllable
            print "$syllable=$word\n";
            $found = 1;
            last; #Stop processing the array of words as we have an example
        }
    }
    print "$syllable wasn't matched in the supplied corpus\n" if (! $found);
}

However I need one more refinement
I need to modify the program such that it finds the syllable in three different environents Initial medial Final Standalone(whole word)
example (theoretical: I know somebody will say "a" here is not a syllable. But I am working with Indian languages).
Syllable "a"
Intial Medial Final Standalone
ago bare gonna a
It could be that the syllable may not appear in all environments as in the case of stri
Intial Medial Final Standalone
strip Astrid NONE NONE
I have tried to factor in the environmental constraints using regexes but the results are disastrous
Please help. I have spent quite a few hours and the results get more ludicrous each time.
Many thanks and my gratitutde to the generous people on the forum who give their time and energy to helping out tyros like me.

Last edited by radoulov; 07-30-2011 at 04:07 AM.. Reason: Code tags!
 

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data... (8 Replies)
Discussion started by: gimley
8 Replies

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e... (6 Replies)
Discussion started by: useretail
6 Replies

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
Discussion started by: gimley
9 Replies

4. Shell Programming and Scripting

[All variants] Change settings

Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated. 1. I need to parse all values and output to stdout. Sample output (truncated): VALUEA 2017-01-01 Lores ipsum Lorem ipsum dolor sit... (11 Replies)
Discussion started by: useretail
11 Replies

5. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
CNTLIST(5WN)						      WordNettm File Formats						      CNTLIST(5WN)

NAME
cntlist - file listing number of times each tagged sense occurs in a semantic concordance, sorted most to least frequently tagged cntlist.rev - file listing number of times each tagged sense occurs in a semantic concordance, sorted by sense key DESCRIPTION
A cntlist file for a semantic concordance lists the number of times each semantically tagged sense occurs in the concordance and its sense number in the WordNet database. Each line in the file corresponds to a sense in the WordNet database to which at least one semantic tag points. Only senses that are tagged in a concordance are in the concordance's cntlist file. WordNet Database cntlist File In the WordNet database, words are assigned sense numbers based on frequency of use in semantically tagged corpora. The cntlist file used by grind(1WN) to build the WordNet database and assign the sense numbers is a union of the cntlist files from the various semantic concor- dances that were formerly released by Princeton University. This combined cntlist file is provided with the WordNet package and is found in the WNSEARCHDIR directory. The cntlist.rev file is used at run-time by the WordNet library code and browser interfaces to print in the output display the number of times each sense has been tagged. File Format Each line in a cntlist file contains information for one sense. The file is ordered from most to least frequently tagged sense. The fields are separated by one space, and each line is terminated with a newline character. Senses having the same tag_cnt value are listed in reverse alphabetical order of the lemma field of the sense_key. Each line in cntlist is of the form: tag_cnt sense_key sense_number where tag_cnt is the decimal number of times the sense is tagged in the corresponding semantic concordance. sense_key is a WordNet sense encoding and sense_number is a WordNet sense number as described in The cntlist.rev file contains the same fields described above, in the following order: sense_key sense_number tag_cnt NOTES
Princeton no longer maintains or releases the Semantic Concordance files. The cntlist file used to order the senses in WordNet 3.0 was generated from the Semantic Concordance files at the point that they were last updated in 2001. In general, the order of senses presented usually reflects what the user would expect, however sense ordering is now less reliable than in prior releases and should not be construed as an accurate indicator of frequency of use. ENVIRONMENT VARIABLES (UNIX) WNHOME Base directory for WordNet. Default is /usr/local/WordNet-3.0. WNSEARCHDIR Directory in which the WordNet database has been installed. Default is WNHOME/dict. REGISTRY (WINDOWS) HKEY_LOCAL_MACHINESOFTWAREWordNet3.0WNHome Base directory for WordNet. Default is C:Program FilesWordNet3.0. HKEY_CURRENT_USERSOFTWAREWordNet3.0wnres User's default browser options. FILES
cntlist, cntlist.rev file of combined semantic concordance cntlist files. Used to assign sense numbers in WordNet database SEE ALSO
grind(1WN), wnintro(5WN), senseidx(5WN). WordNet 3.0 Dec 2006 CNTLIST(5WN)
All times are GMT -4. The time now is 05:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy