Hello,
I have written a syllable splitter for Pseudo English [conforming to the rules of Indic] and Indic.
I have a large database with the following structure
The tool produces syllables in both scripts. An example is given below:
However at times the software goofs up and the number of syllables on either side do not match as in:
As can be seen there is a mismatch: English admits 4 syllables and Devanagari admits only 3
I work in Windows environment and what I need is a script in awk or Perl which will run through the file and identify mismatches as in the example above.
Many thanks for your help and since this is my first post for the Year, belated Happy New Year.
hi i have two files and i want to compare both the files and find out mismatch in 3rd file
file1
00354|1|0|1|1|0|0|0|1|2
52424|1|0|1|1|0|0|0|1|2
43236|1|0|1|1|0|0|0|1|2
41404|1|0|1|1|0|0|0|1|2
79968|1|0|1|1|0|0|0|1|2
file2
00354|1|0|1|1|0|0|0|1|2
52424|1|0|1|1|0|0|0|0|2... (9 Replies)
Hi,
I have a doubt when searching files for the existence of a particular key.
I have a property file has data with key and value pair like below and i call it as property file.ini
here are the contents in File: popertyfile.ini
location.property=2
agent.method=begin
newkey=23
... (2 Replies)
Hello,
I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled.
The syllable file which has syllables in Hindi has one syllable per line
and the corpus file has a data... (8 Replies)
Hello,
I am a relative newbie and want to split Names in English into syllables. Does anyone know of a perl script which does that. Since my main area is linguistics, I would be happy to add rules to it and post the perl script back for other users. I tried the CPan perl modules but they don't... (6 Replies)
Hi
I have used the below steps and found some discrepancies
step 1 :
find ./ -type f -mtime +7 -name "*.00*" | wc -l = 13519 ( total files )
( the size of this files is appx : 10GB )
step 2:
find ./ -type f -mtime +7 -name "*.00*" | xargs tar zcvf Archieve_7.tar.gz
step... (7 Replies)
Hi,
I have a requirement like below.
client is sending the .txt filles.In that file we have 10 records but when I execute the below command it is showing 9 records.
klena20> wc -l sample_file.txt|awk '{print $1}'
It is showing the output as 9
But in a file records are 10. I found... (7 Replies)
In the below awk I am trying output to one file those lines that match between $2,$3,$4 of file1 and file2 with the count in (). I am also trying to output those lines that are missing between $2,$3,$4 of file1 and file2 with the count of in () each. Both input files are tab-delimited, but the... (7 Replies)
Hi,
I have a file with a list of bunch of IP addresses from different VLAN's . I am trying to find the list the number of each vlan occurence in the output
Here is how my file looks like
1.1.1.1
1.1.1.2
1.1.1.3
1.1.2.1
1.1.2.2
1.1.3.1
1.1.3.2
1.1.3.3
1.1.3.4
So what I am trying... (2 Replies)
Discussion started by: new2prog
2 Replies
LEARN ABOUT OSX
locale::script
Locale::Script(3pm) Perl Programmers Reference Guide Locale::Script(3pm)NAME
Locale::Script - standard codes for script identification
SYNOPSIS
use Locale::Script;
$script = code2script('phnx'); # 'Phoenician'
$code = script2code('Phoenician'); # 'Phnx'
$code = script2code('Phoenician',
LOCALE_CODE_NUMERIC); # 115
@codes = all_script_codes();
@scripts = all_script_names();
DESCRIPTION
The "Locale::Script" module provides access to standards codes used for identifying scripts, such as those defined in ISO 15924.
Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 15924
four-letter codes will be used.
SUPPORTED CODE SETS
There are several different code sets you can use for identifying scripts. A code set may be specified using either a name, or a constant
that is automatically exported by this module.
For example, the two are equivalent:
$script = code2script('phnx','alpha');
$script = code2script('phnx',LOCALE_SCRIPT_ALPHA);
The codesets currently supported are:
alpha, LOCALE_SCRIPT_ALPHA
This is a set of four-letter (capitalized) codes from ISO 15924 such as 'Phnx' for Phoenician. It also includes additions to this set
included in the IANA language registry.
The Zxxx, Zyyy, and Zzzz codes are not used.
This is the default code set.
num, LOCALE_SCRIPT_NUMERIC
This is a set of three-digit numeric codes from ISO 15924 such as 115 for Phoenician.
ROUTINES
code2script ( CODE [,CODESET] )
script2code ( NAME [,CODESET] )
script_code2code ( CODE ,CODESET ,CODESET2 )
all_script_codes ( [CODESET] )
all_script_names ( [CODESET] )
Locale::Script::rename_script ( CODE ,NEW_NAME [,CODESET] )
Locale::Script::add_script ( CODE ,NAME [,CODESET] )
Locale::Script::delete_script ( CODE [,CODESET] )
Locale::Script::add_script_alias ( NAME ,NEW_NAME )
Locale::Script::delete_script_alias ( NAME )
Locale::Script::rename_script_code ( CODE ,NEW_CODE [,CODESET] )
Locale::Script::add_script_code_alias ( CODE ,NEW_CODE [,CODESET] )
Locale::Script::delete_script_code_alias ( CODE [,CODESET] )
These routines are all documented in the Locale::Codes::API man page.
SEE ALSO
Locale::Codes
The Locale-Codes distribution.
Locale::Codes::API
The list of functions supported by this module.
http://www.unicode.org/iso15924/
Home page for ISO 15924.
http://www.iana.org/assignments/language-subtag-registry
The IANA language subtag registry.
AUTHOR
See Locale::Codes for full author history.
Currently maintained by Sullivan Beck (sbeck@cpan.org).
COPYRIGHT
Copyright (c) 1997-2001 Canon Research Centre Europe (CRE).
Copyright (c) 2001-2010 Neil Bowers
Copyright (c) 2010-2012 Sullivan Beck
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.16.2 2012-10-11 Locale::Script(3pm)