Regex to identify illegal characters in a perso-arabic database Post: 303002550

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Illegal characters in Servername / Path

Hi there. I wonder if anybody can help me. I am very new to this and a bit out of my depth. I have a .cmd file which sets various environmental variables for me. When I input a server name that does not contains dots (.) in the name it works fine. As soon as I place in a server name...

2. Shell Programming and Scripting

how do I identify files with characters beyond a certain range.

I have a directory with hundreds of files that can not have data pass column 80. I do not know of way to combine "grep" and "cut" command. I tried: cat * | cut -c 81-120 |pg but it only shows me the line, not the file name. Any help would be appreciated. Been on this all...

3. UNIX for Dummies Questions & Answers

Arabic characters in QNX4

I want to display Arabic characters in QNX4. This work was been done by a colleague several years ago but he didn't document his work. I installed fonts and I got this display (attached). Please let me know how can correct as per the initial display were working in Arabic (attached). Thanks...

4. UNIX and Linux Applications

Identify server.database connection

Good afternoon i need your help, i am new at unix, in a ETL scenario like datastage is , there are a bunch of procesess (script shells) conecting to hetereogenius database source servers in order to extract information. Ive got 2 questions 1. Using unix how can i identify exactly the...

5. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article...

6. Shell Programming and Scripting

Regex to identify a full-stop as a sentence delimiter

Hello, Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use: just to name a few. Standard parsers...

7. Shell Programming and Scripting

Regex to identify word in second position on a line

I am interested in finding a regex to find a word in second position on a line. The word in question is या I tried the following PERL EXPRESSION but it did not work: ] या or ^\W या But both gave Null results I am giving below a Sample file: देना या सौंपना=delegate तह जमना या...

8. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the...

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection...

10. UNIX for Beginners Questions & Answers

Regex to identify pattern

Hi In a file I have string in multiple lines. Like below: <?=test.getObjectName("L", "testTBL","D") ?> <?=test.getObjectName("L", "testTBL","testDB", "D") ?> I want to use regex to search for the pattern "<?=test.getObjectName...?>" If the parenthesis has 3 parameters then return 2nd...

LEARN ABOUT DEBIAN

encode::arabic::parkinson

Encode::Arabic::Parkinson(3pm)				User Contributed Perl Documentation			    Encode::Arabic::Parkinson(3pm)

NAME

       Encode::Arabic::Parkinson - Dil Parkinson's transliteration of Arabic

REVISION

	   $Revision: 179 $	   $Date: 2007-01-14 01:23:25 +0100 (Sun, 14 Jan 2007) $

SYNOPSIS

	   use Encode::Arabic::Parkinson;	   # imports just like 'use Encode' would, plus more

	   while ($line = <>) { 		   # Dil Parkinson's mapping into the Arabic script

	       print encode 'utf8', decode 'parkinson', $line;
	   }

	   # shell filter of data, e.g. in *n*x systems instead of viewing the Arabic script proper

	   % perl -MEncode::Arabic::Parkinson -pe '$_ = encode "parkinson", decode "utf8", $_'

	   # employing the modes of conversion for filtering and trimming

	   Encode::Arabic::enmode 'parkinson', 'nosukuun', 'LWE xml';
	   Encode::Arabic::Parkinson->demode(undef, undef, 'strip _');

	   $decode = "AiqoraLo hRvaA Ol_n~a_S~a bi___OnotibaAhI.";
	   $encode = encode 'parkinson', decode 'parkinson', $decode;

	   # $encode eq "AiqraL hRvaA Aln~aS~a biAntibaAhI."

DESCRIPTION

       Dil Parkinson's notation is a one-to-one transliteration of the Arabic script for Modern Standard Arabic, using lower ASCII characters to
       encode the graphemes of the original script.

   IMPLEMENTATION
       Similar to that in Encode::Arabic::Buckwalter.

   EXPORTS & MODES
       The module exports as if "use Encode" also appeared in the package. The other "import" options are just delegated to Encode and imports
       performed properly.

       The conversion modes of this module allow to override the setting of the ":xml" option, in addition to filtering out diacritical marks and
       stripping off kashida. The modes and aliases relate like this:

	   our %Encode::Arabic::Parkinson::modemap = (

		   'default'	   => 0,   'undef'	   => 0,

		   'fullvocalize'  => 0,   'full'	   => 0,

		   'nowasla'	   => 4,

		   'vocalize'	   => 3,   'nosukuun'	   => 3,

		   'novocalize'    => 2,   'novowels'	   => 2,   'none'	   => 2,

		   'noshadda'	   => 1,   'noneplus'	   => 1,
	       );

       enmode ($obj, $mode, $xml, $kshd)
       demode ($obj, $mode, $xml, $kshd)
	   These methods can be invoked directly or through the respective functions of Encode::Arabic. The meaning of the extra parameters
	   follows from the examples of usage.

SEE ALSO

       Encode::Arabic, Encode, Encode::Encoding

       Xerox Arabic Home Page  <http://www.arabic-morphology.com/>

AUTHOR

       Otakar Smrz, <http://ufal.mff.cuni.cz/~smrz/>

	   eval { 'E<lt>' . ( join '.', qw 'otakar smrz' ) . "x40" . ( join '.', qw 'mff cuni cz' ) . 'E<gt>' }

       Perl is also designed to make the easy jobs not that easy ;)

COPYRIGHT AND LICENSE

       Copyright 2006-2007 by Otakar Smrz

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.10.1							    2010-01-18					    Encode::Arabic::Parkinson(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Illegal characters in Servername / Path

Discussion started by: goodjuju

2. Shell Programming and Scripting

how do I identify files with characters beyond a certain range.

Discussion started by: kcsunsun01dev

3. UNIX for Dummies Questions & Answers

Arabic characters in QNX4

Discussion started by: hbc

4. UNIX and Linux Applications

Identify server.database connection

Discussion started by: alexcol

5. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

Discussion started by: egill

6. Shell Programming and Scripting

Regex to identify a full-stop as a sentence delimiter

Discussion started by: gimley

7. Shell Programming and Scripting

Regex to identify word in second position on a line

Discussion started by: gimley

8. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

Discussion started by: gimley

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Discussion started by: gimley

10. UNIX for Beginners Questions & Answers

Regex to identify pattern

Discussion started by: dashing201

LEARN ABOUT DEBIAN

encode::arabic::parkinson