Sponsored Content
Top Forums Shell Programming and Scripting Matching number of syllables on right-hand and left side Post 302998634 by gimley on Monday 5th of June 2017 09:48:24 AM
Old 06-05-2017
Matching number of syllables on right-hand and left side

I am developing a database for translating names. I have mapped through a rule engine syllables in English to syllables in Indic, delimited by an equal to sign.
An example will illustrate this
Code:
ra m=रा म
ku ma r=कु मा र
mo=मो
la l=ला ल
gi ta=गी ता
ka la va ti=कa ला वa ती

However it so happens that due to an error or inconsistency in syllable divisions the number of syllables on the right hand side do not match the number of syllables on the left hand side.
Code:
bo da=बो डa ॡ 
dho dha=ढो ढa ॡ
me d r=मे ड र्* ऌ
me da=मे डa ॡ
ra ma b da=रा मa ब डा ॡ

In the first two instances 2 on the left hand, 3 on the right. In the next two, three on the left and four on the right and in the last case, four on the left and five on the right
I need a script in Perl or Awk which can identify such discrepancies and separate out the database in two files: clean and inconsistent
I work in a Windows environment but have loaded Sed also; however, I am more comfortable with Awk or Perl. The database is around 200,000 entries.
Many thanks for your help
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to get the most left hand string ??

Hi, I remember once seeing a way to get the left most string in a word. Let's say: a="First.Second.Third" (separated by dot) echo ${a#*.} shows --> Second.Third echo ${a##*.} shows --> Third How do I get the the left most string "First" Or "First.Second" ??? Tried to replace #... (2 Replies)
Discussion started by: jfortes
2 Replies

2. Shell Programming and Scripting

adding a 6 digit number retaining 0s on the left

i am new to shell scripting. i want to keep on increamenting a 6 digit number. For eg. 000000 + 1 = 000001 But instead of 000001 i get only 1. How do i do this ? Pls help. (8 Replies)
Discussion started by: kanchan_cp
8 Replies

3. Shell Programming and Scripting

AWK how to strip from right hand side

guys, i am writing a .ksh file to ssh to a remote machine and change all occurances of .ixf to .WIP like this : -->>> for i in *.ixf do echo $i done mv $i $i.WIP exit <<--- --> this returns .ixf.WIP - i can live with that. then i need to sftp from another remote machine, copy the files... (5 Replies)
Discussion started by: angelolamberti
5 Replies

4. Shell Programming and Scripting

calculate the number of days left in a month

does any one have any ideas how i would go about calculating the number of days left in the month from a bash script ?. I want to do some operations on a csv file according to the result (8 Replies)
Discussion started by: dunryc
8 Replies

5. Shell Programming and Scripting

Paste two file side by side together based on specific pattern match problem

Input file_1: P78811 P40108 O17861 Q6NTW1 P40986 Q6PBK1 P38264 Q6PBK1 Q9CZ49 Q1GZI0 Input file_2: (6 Replies)
Discussion started by: patrick87
6 Replies

6. Shell Programming and Scripting

Terminate left side portion of a string

I have a awk file which consists of the follwoing code in file select.awk : /xxx/ { time = gensub(/xxx \*\*\*(.*)/, "\\1", "g") printf("%s\n",time) next } and an input file with the following file file.txt :- xxx ***Wed May 2 18:00:00 CDT 2012 AAA AAAA AAAA xxx... (4 Replies)
Discussion started by: shikshavarma
4 Replies

7. Shell Programming and Scripting

Merge left hand strings mapping to different right hand strings

Hello, I am working on an Urdu to Hindi dictionary which has the following structure: a=b a=c n=d n=q and so on. i.e. Headword separated from gloss by a = I am giving below a live sample بتا=बता بتا=बित्ता بتا=बुत्ता بتان=बतान بتان=बितान بتانا=बिताना I need the following... (3 Replies)
Discussion started by: gimley
3 Replies

8. Shell Programming and Scripting

Difficulties in matching left bracket as literal in awk

I need to work with records having #AX in the EXP1 , please see my data sample and my attempt below: $ cat xx 08:30:33 KEY1 (1255) EXP1 VAL:20AX0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AX0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AW0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AW0030006 $ gawk '{... (1 Reply)
Discussion started by: migurus
1 Replies

9. Shell Programming and Scripting

Mapping syllables in English to syllables in Indic

Hello, I have a large file with the following structure Englishpseudo syllableEnglishpseudo syllable=IndicsyllableIndicsyllable An example will make this clear: la l=ला ल gi ta=गी ता ka la va ti=कa ला वa ती ma h to=मa ह तो ra je sh=रा जे श a sha=आ शा ra me sh=रa मे श san ja y=सं जa य... (3 Replies)
Discussion started by: gimley
3 Replies
Encode::KR(3perl)					 Perl Programmers Reference Guide					 Encode::KR(3perl)

NAME
Encode::KR - Korean Encodings SYNOPSIS
use Encode qw/encode decode/; $euc_kr = encode("euc-kr", $utf8); # loads Encode::KR implicitly $utf8 = decode("euc-kr", $euc_kr); # ditto DESCRIPTION
This module implements Korean charset encodings. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- euc-kr /euc.*kr$/i EUC (Extended Unix Character) /kr.*euc$/i ksc5601-raw Korean standard code set (as is) cp949 /(?:x-)?uhc$/i /(?:x-)?windows-949$/i /ks_c_5601-1987$/i Code Page 949 (EUC-KR + 8,822 (additional Hangul syllables) MacKorean EUC-KR + Apple Vendor Mappings johab JOHAB A supplementary encoding defined in Annex 3 of KS X 1001:1998 iso-2022-kr iso-2022-kr [RFC1557] -------------------------------------------------------------------- To find how to use this module in detail, see Encode. BUGS
When you see "charset=ks_c_5601-1987" on mails and web pages, they really mean "cp949" encodings. To fix that, the following aliases are set; qr/(?:x-)?uhc$/i => '"cp949"' qr/(?:x-)?windows-949$/i => '"cp949"' qr/ks_c_5601-1987$/i => '"cp949"' The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium. SEE ALSO
Encode perl v5.14.2 2010-12-30 Encode::KR(3perl)
All times are GMT -4. The time now is 11:22 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy