Creating Ligatures in Urdu using delimiters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Creating Ligatures in Urdu using delimiters
# 1  
Old 09-26-2013
Creating Ligatures in Urdu using delimiters

Hello,
I want to create a test bed for Urdu ligatural forms. One of the main components is to create a delimiter list. These are forms after which no connectors can be formed.
What I need is a tool which will take a running text or a list of words in a file and split them as soon as a delimiter is encountered. A sample will explain the process:
I am using Latin script for easy facilitation.
DELIMITERS:Let us assume that the delimiters are:
Code:
a,e,i,o,u

Each delimiter separated by a comma
INPUT:
Code:
baker
convoluted
perspicacity

EXPECTED OUTPUT
Code:
ba ke r
co nvo lu te d
pe rspi ca ci ty

i.e. after each delimiter the string is splitted and a space is placed.
Please note that if I had put
Code:
aeo

as a delimiter. Then a string such as :
Code:
archaeological

would be split as
Code:
a rchaeo lo gi ca l

At present I use a macro to do the job. But the process is extremely slow.
An AWK or PERL Script would be of great help, since my OS is Windows.
Many thanks
p.s. Just in case someone is interested in tweaking Urdu, a sample delimiter list is provided below:
Code:
ا,د,ڈ,ذ,ر,ڑ,ز,ژ,و,ے,إ,ۓ,ؤ

and here is a sample text:
Code:
بیوشن
شلوارقمیص
پھوٹتے
دھاتیں
پھنسیوں
جسمیر
نندلال
ارستوفانیس
علَم
شعلوں
اپلائی
اٹکی
دونی
اٹکے
ترکیبی
ہارس
ناقدری
ڈاکیے
سٹارز
پھنسنے
وہائٹ
نزولِ
المنکر
ڈیز
شوربے
نجار
غضنفرعلی
بیضہ
شاہجہان
کنٹرولڈ
تیرہویں
پروٹسٹنٹ
ٹنڈ
کھردری
الردی
بخشتے
وجاں
اورے
ٹیشن
جماتا
انائس
عاشقی
نظارا
ڈیڈ
آکے
چنگاریاں
بھابی
دیارِ
نریندرمودی
مسکراہٹیں
جلاتا
ٹھاٹھ
شاداں
گھناؤنے
پتاپرتھی
نارمن
پگھلتی
کہرام
صلائے
جید
اکھاڑہ
مستنصر
پرکھ
گریسی
بگڑائو

# 2  
Old 09-26-2013
Here's one way I could think of..

Code:
[user@host ~]$ cat file
baker
convoluted
perspicacity
[user@host ~]$ cat test.pl
#! /usr/bin/perl

my @delims = qw / a e i o u /;

my ($str, $x) = (undef, undef);

open I, "< file";
while ($str = <I>) {
    chomp ($str);
    for $x(split('', $str)) {
        (grep {$_ eq $x} @delims) ? print "$x " : print "$x";
    }
    print "\n";
}
close I;
[user@host ~]$
[user@host ~]$ ./test.pl
ba ke r
co nvo lu te d
pe rspi ca ci ty
[user@host ~]$

# 3  
Old 09-26-2013
Hello,
It worked beautifully for the English samples. However the momnet I plugged in the Urdu delimiters, it did not work.
I suppose this is because PERL does not support UTF8. I even tried saving the script as UTF8 with no Byte Order mark, but it did not work.
The only change I made in the script was to replace it with my delimiters.
Code:
my @delims = qw /  ا ڈ ذ ر ڑ ز ژ و ے إ ۓ ؤ /;

each separated by a space as in your case
Just for testing here is a small sample on which I tried
Code:
شلوارقمیص
پھوٹتے
دھاتیں
پھنسیوں
جسمیر
نندلال
ارستوفانیس
علَم
شعلوں
اپلائی
اٹکی
دونی
اٹکے
تر

Basically even if the script is alien, you should see a space between the ligatural forms, but the script spews out the sample file as such.
How do you get around this issue?
Any help or suggestions, please.
Many thanks
# 4  
Old 09-27-2013
I dont know perl, but if bash supports UTF8 (as urub is as you say)

thefile:
Code:
archelogogy
testing
abcdefg
aoeio

The script:
Code:
replacers=(a e i o u)
while read line;do
    for repl in ${replacers[@]};do
        sed s,$repl,\ $repl\ ,g
    done
done < thefile

Calling: thescript
Results:
Code:
a rch e l o g o gy
t e st i ng
a bcd e fg
a o e i o

hth
# 5  
Old 09-27-2013
Code:
$ 
$ cat delimiters.txt
ا,د,ڈ,ذ,ر,ڑ,ز,ژ,و,ے,إ,ۓ,ؤ
$ 
$ 
$ cat sample.txt
بیوشن
شلوارقمیص
پھوٹتے
دھاتیں
پھنسیوں
جسمیر
نندلال
ارستوفانیس
علَم
شعلوں
$ 
$ 
$ cat -n create_ligatures.pl
     1    #!/usr/bin/perl
     2    use open ':encoding(utf8)'; # input/output default encoding will be UTF-8
     3    $delim_file = "delimiters.txt";
     4    open (DL, "<", $delim_file) or die "Can't open $delim_file: $!";
     5    while (<DL>) {
     6      chomp;
     7      @delims = split(/,/, $_);
     8    }
     9    close (DL) or die "Can't close $delim_file: $!";
    10    
    11    $data_file = "sample.txt";
    12    binmode(STDOUT, ":encoding(UTF-8)"); # render utf8 output
    13    open (FH, "<", $data_file) or die "Can't open $data_file: $!";
    14    while (<FH>) {
    15      chomp;
    16      for $x (split(//, $_)) {
    17        (grep {$_ eq $x} @delims) ? print "$x " : print "$x";
    18      }
    19      print "\n";
    20    }
    21    close (FH) or die "Can't close $data_file: $!";
$ 
$ 
$ perl create_ligatures.pl
بیو شن
شلو ا ر قمیص
پھو ٹتے 
د ھا تیں
پھنسیو ں
جسمیر 
نند لا ل
ا ر ستو فا نیس
علَم
شعلو ں
$ 
$ 
$


Last edited by durden_tyler; 09-27-2013 at 02:13 AM..
# 6  
Old 09-27-2013
As stated above but with one change:
Code:
replacers=($(cat delimiters.txt))

hth
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delimiters with awk?

I have a file which is separated by delimiter "|", but the prob is one of my column do contain delimiter as description so how can i differentiate it? PS : the delmiter does have backslash coming before it, if occurring in column Annual|Beleagured|Desc|Denver... (2 Replies)
Discussion started by: nikhil jain
2 Replies

2. Shell Programming and Scripting

Inserting Delimiters

Hi Team, I am trying to get the data in below format Jan 01 | 19:00:32 | xyz | abc | sometext | string however I am not sure of the total number strings which can come in the record hence i cant use something like below as it can end $6 or it can go further cat file| awk... (8 Replies)
Discussion started by: rakesh_411
8 Replies

3. Homework & Coursework Questions

Creating a .profile, displaying system variables, and creating an alias

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Here is what I am supposed to do, word for word from my assignment page: 1. Create/modify and print a... (2 Replies)
Discussion started by: Jagst3r21
2 Replies

4. UNIX for Dummies Questions & Answers

delimiters used in UNIX

Can you point me to information on the different delimited in UNIX like colon, spaces and tabs? (1 Reply)
Discussion started by: momhef4
1 Replies

5. Shell Programming and Scripting

sort with different delimiters

I have a file with the following lines in it: Inbound1:remote - - 01/Nov/2011:08:29:51 -0500 "GET / HTTP/1.1" 404 2098 HTTP Inbound1:remote - - 02/Dec/2011:08:31:42 -0500 "GET / HTTP/1.1" 404 2098 HTTP Inbound3:remote - - 01/Oct/2011:08:29:52 -0500 "GET / HTTP/1.1" 404 2098 HTTP Inbound4:remote... (5 Replies)
Discussion started by: oldman2
5 Replies

6. Shell Programming and Scripting

Use two delimiters in awk

I have a file having lines like: 14: <a="b" val="c"/> 18: <a="x" val="d"/> 54: <a="b" val="c"/> 58: <a="x" val="e"/> I need to create a file with output: 14 d 54 e So basically, for every odd line I need 1st word if delimiter is ':' and for every even... (14 Replies)
Discussion started by: shekhar2010us
14 Replies

7. Shell Programming and Scripting

Delimiters in awk

Line from input file a : b : c " d " e " f : g : h " i " j " k " l output k b a Its taking 7th word when " is the delimiter, 2nd and 1st word when : is the delimiter and returning all in one line.... I am on solaris Thanks..... (1 Reply)
Discussion started by: shekhar2010us
1 Replies

8. Shell Programming and Scripting

Split using two delimiters

I'm trying to do a split using two delimiters. The first delimiter is ": " (or we could call it :\s). The second is "\n". How can or these delimiters so I can toss the values into an array without issue? I tried @array = split /:\s|\n/, $myvar; This doesn't seem to be working. Any an... (3 Replies)
Discussion started by: mrwatkin
3 Replies

9. Shell Programming and Scripting

Two delimiters with AWK

Hello, this thread is more about scripting style than a specific issue. I've to grep from a output some lines and from them obtain a specific entry delimited by < and >. This is my way : 1) grep -i user list | awk '{FS="<";print $NF}' | sed -e 's/>//g' 2) grep -i user list | cut -d","... (10 Replies)
Discussion started by: gogol_bordello
10 Replies

10. Shell Programming and Scripting

help needed with creating challenging bash script with creating directories

Hi, Can someone help me with creating a bash shell script. I need to create a script that gets a positive number n as an argument. The script must create n directories in the current directory with names like map_1, map_2 etcetera. Each directory must be contained within its predecessor. So... (7 Replies)
Discussion started by: I-1
7 Replies
Login or Register to Ask a Question