Sponsored Content
Top Forums Shell Programming and Scripting Writing a clustering concordance for a Perso-Arabic script Post 302951655 by gimley on Sunday 9th of August 2015 01:20:25 AM
Old 08-09-2015
Many thanks for responding.
I understand your queries and in fact I would like to clarify the details so that the Script is more comprehensible
DETAILS
The script invokes 2 files:
1. Syllables: A list of all the syllables.
2. Corpus: A list of words in Arabic script followed by their Indic equivalent, delimited by
Code:
=

EXPECTED FORMAT
In each case the output is supposed to spew out
a. The syllable in question whether it is Initial Medial or Final.
b. At least 6 to 10 examples (at present only one is spewed out)
c. Additional Bells and whistles: A frequency count of all the words [not present in my script: I don't know how to tailor two sets of counts]
In other words the output should be as under:
Code:
SYLLABLE: FREQUENCY 
Initial 6 EXAMPLES 
Medial 6 EXAMPLES 
Final 6 EXAMPLES 
Standalone 6 EXAMPLES

The example should have the String in Arabic and also in Indic script.
If there are none or less, then it should specify the same. At present only one example is spewed out
It does work to a certain extent but the following major problems are there
PROBLEMS
1.The script should address only the Perso-Arabic side using the
Code:
=

delimiter and ignore the Indic side. It does not do that as a result of which all final occurrences are not shown. This is because of the delimiter and therefore valid final occurences in Arabic are not detected. I don't know how to instruct the program to delimit analysis only to the Arabic side of the corpus and ignore the rest
2. I need at least 6-10 instances of tokens from the corpus file. At present only one is given
3. If possible the frequency.should be provided: [ I don't know how to tailor two sets of counts]
I have racked my brains over this and all attempts to get this type of output have failed.
To make the scenario more clear I am attaching the data files as well as the script file.
I have tried again and again to modify the script but the desired formatted output is not spewed out.
One is never too old to learn and I still feel at my age I can master the intricacies of Perl and handle strings.
Many thanks for your help
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Arabic characters in QNX4

I want to display Arabic characters in QNX4. This work was been done by a colleague several years ago but he didn't document his work. I installed fonts and I got this display (attached). Please let me know how can correct as per the initial display were working in Arabic (attached). Thanks... (0 Replies)
Discussion started by: hbc
0 Replies

2. Solaris

arabic setting in solaris

Hi, i have a file which show text on window like, insert into test values('اسيل للخدمات عبر الأثير'); but when i open this file in solaris it don't show like insert into test values('اسيل للخدمات عبر الأثير'); i also want to see the line same as it is on windows kindly help me (3 Replies)
Discussion started by: malikshahid85
3 Replies

3. Solaris

Arabic package in solaris

Hi, I have searched in all installation cds for arabic packages but couldn't find it. 1. Is there any other way to download arabic package? 2. Does we need to reboot the system after installing package? 3. I don't want to reboot the system so is there any service to restart to make the... (2 Replies)
Discussion started by: malikshahid85
2 Replies

4. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data... (8 Replies)
Discussion started by: gimley
8 Replies

5. Shell Programming and Scripting

CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS

Hello, Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was The following script was provided which did the job and for which I am... (3 Replies)
Discussion started by: gimley
3 Replies

6. HP-UX

install arabic lang

hi how to install arabic language and set it as default in hpux. also there is any website provide vm for hpunix for testing. (2 Replies)
Discussion started by: drpix
2 Replies

7. Red Hat

Font chinese and arabic

At present we are using one application , in which they are loading some files. the files are some times a mix of chinese and arabic. Is there any way to encode these literals and do the loading. Rgds Rj ---------- Post updated at 04:54 AM ---------- Previous update was at 04:47 AM... (0 Replies)
Discussion started by: jegaraman
0 Replies

8. Shell Programming and Scripting

Regex to identify illegal characters in a perso-arabic database

I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters. I have identified the character set of Sindhi which is given below: For clarity's sake, each... (8 Replies)
Discussion started by: gimley
8 Replies
Encode::Arabic(3pm)					User Contributed Perl Documentation				       Encode::Arabic(3pm)

NAME
Encode::Arabic - Encodings of Arabic REVISION
$Revision: 808 $ $Date: 2009-02-10 00:19:07 +0100 (Tue, 10 Feb 2009) $ SYNOPSIS
use Encode::Arabic; # imports just like 'use Encode' even with options would while ($line = <>) { # renders the ArabTeX notation for Arabic both in the .. print encode 'utf8', decode 'arabtex', $line; # .. Arabic script proper and the print encode 'utf8', decode 'arabtex-zdmg', $line; # .. Latin phonetic transcription } # 'use Encode::Arabic ":modes"' would export the functions controlling the conversion modes Encode::Arabic::demode 'arabtex', 'default'; Encode::Arabic::enmode 'buckwalter', 'full', 'xml', 'strip off kashida'; # Arabic in lower ASCII transliterations <--> Arabic script in Perl's internal encoding $string = decode 'ArabTeX', $octets; $octets = encode 'Buckwalter', $string; $string = decode 'Buckwalter', $octets; $octets = encode 'ArabTeX', $string; # Arabic in lower ASCII transliterations <--> Latin phonetic transcription, Perl's utf8 $string = decode 'Buckwalter', $octets; $octets = encode 'ArabTeX', $string; $string = decode 'ArabTeX-ZDMG', $octets; $octets = encode 'utf8', $string; DESCRIPTION
This module is a wrapper for various implementations of the encoding systems used for the Arabic language and covering even some non-Arabic extensions to the Arabic script. The included modules fit in the philosophy of Encode::Encoding and can be used directly with the Encode module. LIST OF ENCODINGS ArabTeX ArabTeX multi-character notation for Arabic / Perl's internal format for the Arabic script Encode::Arabic::ArabTeX, uses Encode::Mapper ArabTeX-RE Deprecated method using sequential regular-expression substitutions. Limited in scope over the ArabTeX notation and non-efficient in data processing, still, not requiring the Encode::Mapper module. Encode::Arabic::ArabTeX::RE ArabTeX-Verbatim ArabTeX multi-character verbatim notation for Arabic / Perl's internal format for the Arabic script Encode::Arabic::ArabTeX::Verbatim, uses Encode::Mapper ArabTeX-ZDMG ArabTeX multi-character notation for Arabic / Perl's internal format for the Latin phonetic trascription in the ZDMG style Encode::Arabic::ArabTeX::ZDMG, uses Encode::Mapper ArabTeX-ZDMG-RE Deprecated method using sequential regular-expression substitutions. Limited in scope over the ArabTeX notation and non-efficient in data processing, still, not requiring the Encode::Mapper module. Encode::Arabic::ArabTeX::ZDMG::RE Buckwalter Buckwalter one-to-one notation for Arabic / Perl's internal format for the Arabic script Encode::Arabic::Buckwalter Parkinson Parkinson one-to-one notation for Arabic / Perl's internal format for the Arabic script Encode::Arabic::Parkinson There are generic aliases to these provided by Encode. Case does not matter and all characters of the class "[ _-]" are interchangeable. Note that the standard Encode module already deals with several other single-byte encoding schemes for Arabic popular with whichever operating system, be it *n*x, Windows, DOS or Macintosh. See Encode::Supported and Encode::Byte for their identification names and aliases. EXPORTS & MODES The module exports as if "use Encode" also appeared in the calling package. The "import" options are just delegated to Encode and imports performed properly, with the exception of the ":modes" option coming first in the list. In such a case, the following functions will be introduced into the namespace of the importing package: enmode ($enc, @list) Calls the "enmode" method associated with the given $enc encoding, and passes the @list to it. The idea is similar to the "encode" functions and methods of the Encode and Encode::Encoding modules, respectively. Used for control over the modes of conversion. demode ($enc, @list) Analogous to "enmode", but calling the appropriate "demode" method. See the individual implementations of the listed encodings. SEE ALSO
Encode::Arabic Online Interface <http://quest.ms.mff.cuni.cz/encode/> Encode Arabic Project <http://sourceforge.net/projects/encode-arabic/> ElixirFM Online Interface <http://quest.ms.mff.cuni.cz/elixir/> ElixirFM Project <http://sourceforge.net/projects/elixir-fm/> Klaus Lagally's ArabTeX <ftp://ftp.informatik.uni-stuttgart.de/pub/arabtex/arabtex.htm> Tim Buckwalter's Qamus <http://www.qamus.org/> Arabeyes Arabic Unix Project <http://www.arabeyes.org/> Lecture Notes on Arabic NLP <http://ufal.mff.cuni.cz/~smrz/ANLP/anlp-lecture-notes.pdf> Encode, Encode::Encoding, Encode::Mapper, Encode::Byte Locale::Recode MARC::Charset Lingua::AR::MacArabic, Lingua::AR::Word Text::TransMetaphone AUTHOR
Otakar Smrz, <http://ufal.mff.cuni.cz/~smrz/> eval { 'E<lt>' . ( join '.', qw 'otakar smrz' ) . "x40" . ( join '.', qw 'mff cuni cz' ) . 'E<gt>' } Perl is also designed to make the easy jobs not that easy ;) COPYRIGHT AND LICENSE
Copyright 2003-2009 by Otakar Smrz This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.1 2010-01-18 Encode::Arabic(3pm)
All times are GMT -4. The time now is 12:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy