Matching number of syllables on right-hand and left side


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching number of syllables on right-hand and left side
# 1  
Old 06-05-2017
Matching number of syllables on right-hand and left side

I am developing a database for translating names. I have mapped through a rule engine syllables in English to syllables in Indic, delimited by an equal to sign.
An example will illustrate this
Code:
ra m=रा म
ku ma r=कु मा र
mo=मो
la l=ला ल
gi ta=गी ता
ka la va ti=कa ला वa ती

However it so happens that due to an error or inconsistency in syllable divisions the number of syllables on the right hand side do not match the number of syllables on the left hand side.
Code:
bo da=बो डa ॡ 
dho dha=ढो ढa ॡ
me d r=मे ड र्* ऌ
me da=मे डa ॡ
ra ma b da=रा मa ब डा ॡ

In the first two instances 2 on the left hand, 3 on the right. In the next two, three on the left and four on the right and in the last case, four on the left and five on the right
I need a script in Perl or Awk which can identify such discrepancies and separate out the database in two files: clean and inconsistent
I work in a Windows environment but have loaded Sed also; however, I am more comfortable with Awk or Perl. The database is around 200,000 entries.
Many thanks for your help
# 2  
Old 06-05-2017
Run as perl separate.pl gimley.example
Code:
use strict;
use warnings;

my $clean = 'clean.gmly';
my $inconsistent = 'inconsistent.gmly';

open my $clean_fh, '>', $clean or die;
open my $inconsistent_fh, '>', $inconsistent or die;

while(<>) {
  my ($lh, $rh) = split /=/;
  $lh = split /\s+/, $lh;
  $rh = split /\s+/, $rh;
  if($lh != $rh) {
    print $inconsistent_fh $_;
  }
  else {
    print $clean_fh $_;
  }
}

close $clean_fh;
close $inconsistent_fh;

This User Gave Thanks to Aia For This Post:
# 3  
Old 06-05-2017
Try:
Code:
awk -F= 'split($1,F," ")!=split($2,F," "){print>f; next}1' f=file.bad file > file.good


Last edited by Scrutinizer; 06-05-2017 at 01:27 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 06-05-2017
Many thanks for both solutions. They worked great.
Sorry for the delay
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mapping syllables in English to syllables in Indic

Hello, I have a large file with the following structure Englishpseudo syllableEnglishpseudo syllable=IndicsyllableIndicsyllable An example will make this clear: la l=ला ल gi ta=गी ता ka la va ti=कa ला वa ती ma h to=मa ह तो ra je sh=रा जे श a sha=आ शा ra me sh=रa मे श san ja y=सं जa य... (3 Replies)
Discussion started by: gimley
3 Replies

2. Shell Programming and Scripting

Difficulties in matching left bracket as literal in awk

I need to work with records having #AX in the EXP1 , please see my data sample and my attempt below: $ cat xx 08:30:33 KEY1 (1255) EXP1 VAL:20AX0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AX0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AW0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AW0030006 $ gawk '{... (1 Reply)
Discussion started by: migurus
1 Replies

3. Shell Programming and Scripting

Merge left hand strings mapping to different right hand strings

Hello, I am working on an Urdu to Hindi dictionary which has the following structure: a=b a=c n=d n=q and so on. i.e. Headword separated from gloss by a = I am giving below a live sample بتا=बता بتا=बित्ता بتا=बुत्ता بتان=बतान بتان=बितान بتانا=बिताना I need the following... (3 Replies)
Discussion started by: gimley
3 Replies

4. Shell Programming and Scripting

Terminate left side portion of a string

I have a awk file which consists of the follwoing code in file select.awk : /xxx/ { time = gensub(/xxx \*\*\*(.*)/, "\\1", "g") printf("%s\n",time) next } and an input file with the following file file.txt :- xxx ***Wed May 2 18:00:00 CDT 2012 AAA AAAA AAAA xxx... (4 Replies)
Discussion started by: shikshavarma
4 Replies

5. Shell Programming and Scripting

Paste two file side by side together based on specific pattern match problem

Input file_1: P78811 P40108 O17861 Q6NTW1 P40986 Q6PBK1 P38264 Q6PBK1 Q9CZ49 Q1GZI0 Input file_2: (6 Replies)
Discussion started by: patrick87
6 Replies

6. Shell Programming and Scripting

calculate the number of days left in a month

does any one have any ideas how i would go about calculating the number of days left in the month from a bash script ?. I want to do some operations on a csv file according to the result (8 Replies)
Discussion started by: dunryc
8 Replies

7. Shell Programming and Scripting

AWK how to strip from right hand side

guys, i am writing a .ksh file to ssh to a remote machine and change all occurances of .ixf to .WIP like this : -->>> for i in *.ixf do echo $i done mv $i $i.WIP exit <<--- --> this returns .ixf.WIP - i can live with that. then i need to sftp from another remote machine, copy the files... (5 Replies)
Discussion started by: angelolamberti
5 Replies

8. Shell Programming and Scripting

adding a 6 digit number retaining 0s on the left

i am new to shell scripting. i want to keep on increamenting a 6 digit number. For eg. 000000 + 1 = 000001 But instead of 000001 i get only 1. How do i do this ? Pls help. (8 Replies)
Discussion started by: kanchan_cp
8 Replies

9. Shell Programming and Scripting

How to get the most left hand string ??

Hi, I remember once seeing a way to get the left most string in a word. Let's say: a="First.Second.Third" (separated by dot) echo ${a#*.} shows --> Second.Third echo ${a##*.} shows --> Third How do I get the the left most string "First" Or "First.Second" ??? Tried to replace #... (2 Replies)
Discussion started by: jfortes
2 Replies
Login or Register to Ask a Question