Find Syllable count mismatch Post: 303029510

Sponsored Content

Top Forums Shell Programming and Scripting Find Syllable count mismatch Post 303029510 by gimley on Monday 28th of January 2019 08:28:43 AM

01-28-2019

Registered User

Find Syllable count mismatch

Hello,
I have written a syllable splitter for Pseudo English [conforming to the rules of Indic] and Indic.
I have a large database with the following structure

Code:

Syllables in Pseudo English delimited by |=Syllables in Devanagari delimited by |

The tool produces syllables in both scripts. An example is given below:

Code:

a|bba|l=अ|ब्ब|ल
a|bA|s=अ|बा|स
a|bbA|s=अ|ब्बा|स
A|ba|dA=आ|ब|दा
a|bde|sh=अ|ब्दे|श
a|b|dhe|sh=अ|ब|धे|श
a|bdu|l=अ|ब्दु|ल
a|bdu|lA=अ|ब्दु|ला
a|bdu|llA=अ|ब्दु|ल्ला
a|bdu|lla|h=अ|ब्दु|ल्ल|ह
a|bdu|llA|h=अ|ब्दु|ल्ला|ह
a|bdu|r=अ|ब्दु|र
A|bhA=आ|भा
a|bha|y=अ|भ|य
a|bhi=अ|भी
a|bhi|ji|t=अ|भि|जी|त

However at times the software goofs up and the number of syllables on either side do not match as in:

Code:

zu|ba|i|dA=ज़ु|बै|दा
zu|ba|i|r=ज़ु|बै|र

As can be seen there is a mismatch: English admits 4 syllables and Devanagari admits only 3
I work in Windows environment and what I need is a script in awk or Perl which will run through the file and identify mismatches as in the example above.
Many thanks for your help and since this is my first post for the Year, belated Happy New Year.

gimley

View Public Profile for gimley

Find all posts by gimley

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

comparing two files and find mismatch

hi i have two files and i want to compare both the files and find out mismatch in 3rd file file1 00354|1|0|1|1|0|0|0|1|2 52424|1|0|1|1|0|0|0|1|2 43236|1|0|1|1|0|0|0|1|2 41404|1|0|1|1|0|0|0|1|2 79968|1|0|1|1|0|0|0|1|2 file2 00354|1|0|1|1|0|0|0|1|2 52424|1|0|1|1|0|0|0|0|2...

2. Shell Programming and Scripting

To find String mismatch

Hi, I have a doubt when searching files for the existence of a particular key. I have a property file has data with key and value pair like below and i call it as property file.ini here are the contents in File: popertyfile.ini location.property=2 agent.method=begin newkey=23 ...

3. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data...

4. Shell Programming and Scripting

Syllable splitter in Perl

Hello, I am a relative newbie and want to split Names in English into syllables. Does anyone know of a perl script which does that. Since my main area is linguistics, I would be happy to add rules to it and post the perl script back for other users. I tried the CPan perl modules but they don't...

5. UNIX for Dummies Questions & Answers

Files count mismatch when used with Tar with find

Hi I have used the below steps and found some discrepancies step 1 : find ./ -type f -mtime +7 -name "*.00*" | wc -l = 13519 ( total files ) ( the size of this files is appx : 10GB ) step 2: find ./ -type f -mtime +7 -name "*.00*" | xargs tar zcvf Archieve_7.tar.gz step...

6. Shell Programming and Scripting

Count mismatch in UNIX

Hi, I have a requirement like below. client is sending the .txt filles.In that file we have 10 records but when I execute the below command it is showing 9 records. klena20> wc -l sample_file.txt|awk '{print $1}' It is showing the output as 9 But in a file records are 10. I found...

7. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

In the below awk I am trying output to one file those lines that match between $2,$3,$4 of file1 and file2 with the count in (). I am also trying to output those lines that are missing between $2,$3,$4 of file1 and file2 with the count of in () each. Both input files are tab-delimited, but the...

8. UNIX for Beginners Questions & Answers

How to find the count of IP addresses that belong to different subnets and display the count?

Hi, I have a file with a list of bunch of IP addresses from different VLAN's . I am trying to find the list the number of each vlan occurence in the output Here is how my file looks like 1.1.1.1 1.1.1.2 1.1.1.3 1.1.2.1 1.1.2.2 1.1.3.1 1.1.3.2 1.1.3.3 1.1.3.4 So what I am trying...

LEARN ABOUT MOJAVE

locale::codes::langfam5.18

Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME

       Locale::Codes::LangFam - standard codes for language extension identification

SYNOPSIS

	  use Locale::Codes::LangFam;

	  $lext = code2langfam('apa');		       # $lext gets 'Apache languages'
	  $code = langfam2code('Apache languages');    # $code gets 'apa'

	  @codes   = all_langfam_codes();
	  @names   = all_langfam_names();

DESCRIPTION

       The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in
       ISO 639-5.

       Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5
       language family codes will be used.

SUPPORTED CODE SETS

       There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a
       constant that is automatically exported by this module.

       For example, the two are equivalent:

	  $lext = code2langfam('apa','alpha');
	  $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA);

       The codesets currently supported are:

       alpha
	   This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages.

	   This is the default code set.

ROUTINES

       code2langfam ( CODE [,CODESET] )
       langfam2code ( NAME [,CODESET] )
       langfam_code2code ( CODE ,CODESET ,CODESET2 )
       all_langfam_codes ( [CODESET] )
       all_langfam_names ( [CODESET] )
       Locale::Codes::LangFam::rename_langfam  ( CODE ,NEW_NAME [,CODESET] )
       Locale::Codes::LangFam::add_langfam  ( CODE ,NAME [,CODESET] )
       Locale::Codes::LangFam::delete_langfam  ( CODE [,CODESET] )
       Locale::Codes::LangFam::add_langfam_alias  ( NAME ,NEW_NAME )
       Locale::Codes::LangFam::delete_langfam_alias  ( NAME )
       Locale::Codes::LangFam::rename_langfam_code  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangFam::add_langfam_code_alias  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangFam::delete_langfam_code_alias  ( CODE [,CODESET] )
	   These routines are all documented in the Locale::Codes::API man page.

SEE ALSO

       Locale::Codes
	   The Locale-Codes distribution.

       Locale::Codes::API
	   The list of functions supported by this module.

       http://www.loc.gov/standards/iso639-5/id.php
	   ISO 639-5 .

AUTHOR

       See Locale::Codes for full author history.

       Currently maintained by Sullivan Beck (sbeck@cpan.org).

COPYRIGHT

	  Copyright (c) 2011-2013 Sullivan Beck

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.18.2							    2013-11-04					       Locale::Codes::LangFam(3pm)

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

comparing two files and find mismatch

Discussion started by: dodasajan

2. Shell Programming and Scripting

To find String mismatch

Discussion started by: raghu.amilineni

3. Shell Programming and Scripting

Creating a syllable concordance

Discussion started by: gimley

4. Shell Programming and Scripting

Syllable splitter in Perl

Discussion started by: gimley

5. UNIX for Dummies Questions & Answers

Files count mismatch when used with Tar with find

Discussion started by: rakeshkumar

6. Shell Programming and Scripting

Count mismatch in UNIX

Discussion started by: kirankumar

7. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

Discussion started by: cmccabe

8. UNIX for Beginners Questions & Answers

How to find the count of IP addresses that belong to different subnets and display the count?

Discussion started by: new2prog

LEARN ABOUT MOJAVE

locale::codes::langfam5.18