Sponsored Content
Top Forums Shell Programming and Scripting Regex to identify illegal characters in a perso-arabic database Post 303002501 by gimley on Friday 25th of August 2017 11:01:53 PM
Old 08-26-2017
Regex to identify illegal characters in a perso-arabic database

I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters.
I have identified the character set of Sindhi which is given below:
For clarity's sake, each character is demarcated by an apostrophe followed by a comma
Code:
['ا','ب','ٻ','پ','ڀ','ت','ٺ','ٽ','ث','ٿ','ف','ڦ','گ','ڳ','ڱ','ک','ي','د','ذ','ڌ','ڏ','ڊ','ڍ','ح','ج','ڄ','ڃ','چ','ڇ','خ','ع','غ','ر','ڙ','م','ن','ل','س','ش','و','ق','ص','ض','ڻ','ط','ظ','ھ','جھ','گھ','ڪ','ء','ه','آ']

I wrote a regex in Unix to identify all words where the Sindhi character set does not exist.
Code:
^[^ابٻپڀتٺٽثٿفڦگڳڱکيدذڌڏڊڍحجڄڃچڇخعغرڙمنلسشوقصضڻطظھجھگھڪءهآ]+$

The syntax being: find all strings where the given characters are not found.
However the regex does not work. What went wrong? Do I need to put a comma after every character ?
I am giving below a small sample database where there are words having both legal and illegal characters
Code:
آبادلاهورپشاورڪوئٽه
اپریل
ای
ایڈریس
بيورو
جائے
جنوری
جوابدارشاهد
خانوں
دتجايا
دیں
روحناي
سنڌسماءَچار
سے
ضروری
فروری
مئی
منقتل
میل
نيشنلاسلام
ویب*
پاڪستانسنڌمقامي
پوڻان
ڊ
کریں
کیا
ڪراچي-حيدرآباد
گرميت
گھنٹے
گیا

Any help given would be greatly appreciated. Many thanks in advance
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Illegal characters in Servername / Path

Hi there. I wonder if anybody can help me. I am very new to this and a bit out of my depth. I have a .cmd file which sets various environmental variables for me. When I input a server name that does not contains dots (.) in the name it works fine. As soon as I place in a server name... (5 Replies)
Discussion started by: goodjuju
5 Replies

2. Shell Programming and Scripting

how do I identify files with characters beyond a certain range.

I have a directory with hundreds of files that can not have data pass column 80. I do not know of way to combine "grep" and "cut" command. I tried: cat * | cut -c 81-120 |pg but it only shows me the line, not the file name. Any help would be appreciated. Been on this all... (3 Replies)
Discussion started by: kcsunsun01dev
3 Replies

3. UNIX for Dummies Questions & Answers

Arabic characters in QNX4

I want to display Arabic characters in QNX4. This work was been done by a colleague several years ago but he didn't document his work. I installed fonts and I got this display (attached). Please let me know how can correct as per the initial display were working in Arabic (attached). Thanks... (0 Replies)
Discussion started by: hbc
0 Replies

4. UNIX and Linux Applications

Identify server.database connection

Good afternoon i need your help, i am new at unix, in a ETL scenario like datastage is , there are a bunch of procesess (script shells) conecting to hetereogenius database source servers in order to extract information. Ive got 2 questions 1. Using unix how can i identify exactly the... (1 Reply)
Discussion started by: alexcol
1 Replies

5. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies

6. Shell Programming and Scripting

Regex to identify a full-stop as a sentence delimiter

Hello, Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use: just to name a few. Standard parsers... (9 Replies)
Discussion started by: gimley
9 Replies

7. Shell Programming and Scripting

Regex to identify word in second position on a line

I am interested in finding a regex to find a word in second position on a line. The word in question is या I tried the following PERL EXPRESSION but it did not work: ] या or ^\W या But both gave Null results I am giving below a Sample file: देना या सौंपना=delegate तह जमना या... (8 Replies)
Discussion started by: gimley
8 Replies

8. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
Discussion started by: gimley
9 Replies

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

10. UNIX for Beginners Questions & Answers

Regex to identify pattern

Hi In a file I have string in multiple lines. Like below: <?=test.getObjectName("L", "testTBL","D") ?> <?=test.getObjectName("L", "testTBL","testDB", "D") ?> I want to use regex to search for the pattern "<?=test.getObjectName...?>" If the parenthesis has 3 parameters then return 2nd... (5 Replies)
Discussion started by: dashing201
5 Replies
DBA_OPEN(3)								 1							       DBA_OPEN(3)

dba_open - Open database

SYNOPSIS
resource dba_open (string $path, string $mode, [string $handler], [mixed $...]) DESCRIPTION
dba_open(3) establishes a database instance for $path with $mode using $handler. PARAMETERS
o $path - Commonly a regular path in your filesystem. o $mode - It is r for read access, w for read/write access to an already existing database, c for read/write access and database creation if it doesn't currently exist, and n for create, truncate and read/write access. The database is created in BTree mode, other modes (like Hash or Queue) are not supported. Additionally you can set the database lock method with the next char. Use l to lock the database with a .lck file or d to lock the databasefile itself. It is important that all of your applications do this consis- tently. If you want to test the access and do not want to wait for the lock you can add t as third character. When you are abso- lutely sure that you do not require database locking you can do so by using - instead of l or d. When none of d, l or - is used, dba will lock on the database file as it would with d. Note There can only be one writer for one database file. When you use dba on a web server and more than one request requires write operations they can only be done one after another. Also read during write is not allowed. The dba extension uses locks to prevent this. See the following table: DBA locking +-------------+---------------+---+---+---+---+---+---+---+ |already open | | | | | | | | | | | | | | | | | | | | | $mode = "rl" | | | | | | | | | | | | | | | | | | | | $mode = "rlt" | | | | | | | | | | | | | | | | | | | | $mode = "wl" | | | | | | | | | | | | | | | | | | | | $mode = "wlt" | | | | | | | | | | | | | | | | | | | | $mode = "rd" | | | | | | | | | | | | | | | | | | | | $mode = "rdt" | | | | | | | | | | | | | | | | | | | | $mode = "wd" | | | | | | | | | | | | | | | | | | | | $mode = "wdt" | | | | | | | | | | | | | | | | | | +-------------+---------------+---+---+---+---+---+---+---+ | not open | | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | |$mode = "rl" | | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | wait | | | | | | | | | | | | | | | | | | | | false | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | |$mode = "wl" | | | | | | | | | | | | | | | | | | | | | wait | | | | | | | | | | | | | | | | | | | | false | | | | | | | | | | | | | | | | | | | | wait | | | | | | | | | | | | | | | | | | | | false | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | |$mode = "rd" | | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | ok | | | | | | | | | | | | | | | | | | | | wait | | | | | | | | | | | | | | | | | | | | false | | | | | | | | | | | | | | | | | | |$mode = "wd" | | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | illegal | | | | | | | | | | | | | | | | | | | | wait | | | | | | | | | | | | | | | | | | | | false | | | | | | | | | | | | | | | | | | | | wait | | | | | | | | | | | | | | | | | | | | false | | | | | | | | | | | | | | | | | | +-------------+---------------+---+---+---+---+---+---+---+ ook: the second call will be successfull. owait: the second call waits until dba_close(3) is called for the first. ofalse: the second call returns false. oillegal: you must not mix "l" and "d" modifiers for $mode parameter. o $handler - The name of the handler which shall be used for accessing $path. It is passed all optional parameters given to dba_open(3) and can act on behalf of them. RETURN VALUES
Returns a positive handle on success or FALSE on failure. CHANGELOG
+--------+---------------------------------------------------+ |Version | | | | | | | Description | | | | +--------+---------------------------------------------------+ | 4.3.0 | | | | | | | It's possible to open database files over net- | | | work connection. However in cases a socket con- | | | nection will be used (as with http or ftp) the | | | connection will be locked instead of the resource | | | itself. This is important to know since in such | | | cases locking is simply ignored on the resource | | | and other solutions have to be found. | | | | +--------+---------------------------------------------------+ SEE ALSO
dba_popen(3), dba_close(3). PHP Documentation Group DBA_OPEN(3)
All times are GMT -4. The time now is 03:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy