Sponsored Content
Top Forums Shell Programming and Scripting Modifying an awk script for syllable splitting Post 302969594 by gimley on Thursday 24th of March 2016 09:42:25 AM
Old 03-24-2016
Modifying an awk script for syllable splitting

I have found this syllable splitter in awk. The code is given below. Basically the script cuts words and names into syllables. However it fails when the word contains 2 consonants which constitute a single syllable. An example is given below
Code:
ashford
raphael

The output is as under:
Code:
ashford	as-hford	2	 VC-CCVrC
raphael	rap-ha-el	3	 rVC-CV-VC

instead of
Code:
ashford	ash-ford	2	 VCC-CVrC
raphael	ra-pha-el	3	 rVC-CV-VC

How do I modify the code to allow sh or ph to be treated as a single syllable.
I contacted the authors who have not reponded since the code is old and maybe they do not see any merit in changing the code.
A single example of modification either for ph or sh would help. I can then modify the code for all other such combinations.
Out of respect for the authors I have removed their names from the script.
Many thanks
Awk script follows
Code:
# This script reads a tab-separated file and syllabifies the columns pointed to by the variable'phons' (ot the first column, by default).
# gawk -f syll.gk fn>fn.out

BEGIN {
  FS="\t"; 
  OFS="\t";
  
  if (code=="brulex") {
    V="[aiouyîâêôû^eEéAO_]"; # vowels
    C="[ptkbdgfs/vzjmn/shN£]"; # consonants except liquids & semivowels
    C1="[pkbgfs/vzj]";
    L="[lR]"; # liquids 
    Y="[ïü\377]"; # semi-vowels \377 stands for y-umlaut
    X="[ptkbdgfs/vzjmnN£xlRïü\377]"; # all consonants 
  } else { # code == LAIPTTS)
    V="[iYeE2591a@oO§uy*]";   # Vowels
    C="[pbmfvtdnNkgszxSZGh/sh]";  # Consonants except liquids & semivowels
    C1="[pkbgfsSvzZ]";
    L="[lR]"; # liquids
    Y="[j8w]"; # semi-vowels
    X="[pbmfvtdnNkgszSZGlRrhxGj8w]";   # all consonants, including semivowels
  }
  if (phons==0) phons=1;
}

{
 a=$phons;
 n=1
}

{
   while (i= match (a, V V)) {
    a=substr(a,1,i) "-" substr(a,i+1,length(a)); n++; }

  while (i= match(a, V X V)) { 
    a=substr(a,1,i) "-" substr(a,i+1,length(a)); n++}

  while (i=match(a, V Y Y V)) {
    a=substr(a,1,i+1) "-" substr(a,i+2, length(a)); n++} 

  while (i=match(a, V C Y V)) {
    a=substr(a,1,i) "-" substr(a,i+1, length(a)); n++} 

  while (i=match(a, V L Y V)) {
    a=substr(a,1,i) "-" substr(a,i+1, length(a)); n++}

  while (i=match(a, V "[td]R" V)) {
    a=substr(a,1,i) "-" substr(a,i+1, length(a)); n++} 

  while (i=match(a, V "[td]R" Y V)) {
    a=substr(a,1,i) "-" substr(a,i+1, length(a)); n++} 

  while (i=match(a, V C1 L V)) {
    a=substr(a,1,i) "-" substr (a,i+1,length(a)); n++}

  while (i=match(a, V X X V)) {
    a=substr(a,1,i+1) "-" substr(a,i+2, length(a)); n++}

  while (i= match(a, V X X X V)) {
    a=substr(a,1,i+1) "-" substr(a,i+2,length(a)); n++}

  while (i=match(a, V X X X X V)) {
    a=substr(a,1,i+1) "-" substr(a,i+2,length(a)); n++}

  while (i=match(a, V X X X X X V)) {
    a=substr(a,1,i+1) "-" substr(a,i+2,length(a)); n++}

# suppress the final schwa (^) in some multisyllabic words 
# notr^ -> notR
# ar-bR^   =>  aRbR
  b=gensub(/-([^-]+)\^$/,"\\1",1,a) ;  
  if (b!=a) { # there is a schwa to delete
    a=b; 
    $phons=substr($phons,1,length($phons)-1);
    n--;
      }
# meme chose quand schwa='*'
  b=gensub(/-([^-]+)\*$/,"\\1",1,a) ;  
  if (b!=a) { # there is a schwa to delete
    a=b; 
    $phons=substr($phons,1,length($phons)-1);
    n--;
      }


# compute the CVY skeleton
  sk= " ";
  for (i=1;i<=length(a);i++) {
    ph=substr(a,i,1);
    if (ph~V) sk=sk"V";
    else if ((ph~C)||(ph~L)) sk=sk"C";
    else if (ph~Y) sk=sk"Y";
    else sk=sk ph;
  }
}

{ print $0,a,n,sk }

 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk modifying entries on 2 lines at 2 positions

Hi this script adds text in the correct place on one line only, in a script. awk 'BEGIN{ printf "Enter residue and chain information: " getline var < "-" split(var,a) } /-s rec:/{$7=a; } {print}' FLXDOCK but I need the same info added at position 7 on line 34 and... (1 Reply)
Discussion started by: gav2251
1 Replies

2. Shell Programming and Scripting

modifying a awk line

Hi, I want to print specific columns (from 201 to 1001). The line that I am using is listed below. However I also want to print column 1. So column 1 and 201 to 1001. What modifcations do I need to make? Code: awk -F'\t' 'BEGIN {min = 201; max = 1001 }{for (i=min; i<=max; i++) printf... (5 Replies)
Discussion started by: phil_heath
5 Replies

3. Shell Programming and Scripting

AWK script for programatically modifying java files

Hi, I want to add a String variable to all java classes in my project. Assuming a class like public class Random { String var="Constant string"; ... ... ... } The text in bold is what I want to add to all java files in my workspace. I am an absolute newbie to AWK, and read somewhere that... (5 Replies)
Discussion started by: rocker86
5 Replies

4. UNIX for Dummies Questions & Answers

Understanding / Modifying AWK command

Hey all, So I have an AWK command here awk '{if(FNR==NR) {arr++;next} if($0 in arr) { arr--; if (arr == 0) delete arr;next}{print $0 >"list2output.csv"}} END {for(i in arr){print i >"list1output.csv"}}' list1 list2 (refer to image for a more readable format) This code was submitted... (1 Reply)
Discussion started by: Aussiemick
1 Replies

5. Shell Programming and Scripting

awk script for modifying the file

I have the records in the format one row 0009714494919I MY010727408948010 NNNNNN N PUSAAR727408948010 R007YM08705 9602002 S 111+0360832-0937348 I want to get it int the format 0009714494919I MY010727408948010 NNNNNN N PUSAAR727408948010 R007YM08705 9602002 S ... (2 Replies)
Discussion started by: sonam273
2 Replies

6. Shell Programming and Scripting

Modifying awk code to be inside condition

I have the following awk script and I want to change it to be inside a condition for the file extension. ################################################################################ # abs: Returns the absolute value of a number function abs(val) { return val > 0 ? val \ ... (4 Replies)
Discussion started by: kristinu
4 Replies

7. Shell Programming and Scripting

Need some help modifying script

I have a script that currently runs fine and I need to add or || (or) condition to the if statement and I'm not sure the exact syntax as it relates to the use of brackets. my current script starts like this: errLog="/usr/local/website-logs/error.log" apacheRestart="service httpd restart"... (3 Replies)
Discussion started by: jjj0923
3 Replies

8. Shell Programming and Scripting

Awk: Modifying columns based on comparison

Hi, I have following input in the file in which i want to club the entries based on $1. Also $11 is equal to $13 of other record(where $13 must be on higher side for any $1) then sum all other fields except $11 & $13. Final output required is as follows: INPUTFILE: ... (11 Replies)
Discussion started by: siramitsharma
11 Replies
All times are GMT -4. The time now is 07:07 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy