Sponsored Content
Top Forums Shell Programming and Scripting A beginner needing some help programming documents Post 302765641 by Don Cragun on Sunday 3rd of February 2013 04:46:14 PM
Old 02-03-2013
Quote:
Originally Posted by RudiC
This is the first time I have to struggle with UTF>8 chars, so I'm feeling a bit overstrained, and you should take my proposal as a mere direction indicator. On top, both your input files have trailing blanks that I removed. If they are needed, you have to insert special action into the code. Here's my meek approach:
Code:
awk    'NR==FNR {sub(/[^0-9]$/, "&0");gsub (/[0-9]/,",&,");  Ar[NR]=$2$4; next}
     {gsub (/.../,"&,"); $1=$1","substr (Ar[FNR],1,1); if ($2) $2=$2","substr (Ar[FNR],2,1)}
     1
    ' FS="," OFS="," file2 file3
你,3,们,0,
好,3,
家,1,明,2,
你,3,
好,3,

The trailing commas are due to the insufficient attempt to separate chinese syllables which I didn't bother to remove - I'm sure you have better means in your locale!
Nomadblue,
RudiC's code looks reasonable, but I haven't been able to test it. I have found that awk on OS X Version 10.7.5 (Lion) counts bytes instead of counting characters when using substr() and length() and that using a regular expression to search for a space fails if the space follows a multibyte character (not just in awk; but also at least in bash, ed, ex, grep, ksh, sed, and vi). My testing was done with LANG set to en_US.UTF-8 and no LC_* environment variables set.

I would love to hear if this has been fixed in Mountain Lion.

************************
Update: I take back what I said about REs not matching spaces after multibyte characters. The characters that I originally thought were spaces were multibyte characters consisting of the octal byte sequences: 0343 0200 0200 and 0342 0200 0206. Those two characters aren't spaces, but they are in the locale's space character class.

---------- Post updated Feb 3rd, 2013 at 13:46 ---------- Previous update was Feb 2nd, 2013 at 23:13 ----------

The following script seems to do what you want except that it does not print any trailing space character class characters at the ends of the output lines. (Note that Word list 1 had a trailing character in the space character class on lines 3 and 5, Word list 2 on lines 2 and 3, and your desired outcome on lines 2 and 3. The output produced by this script does not include any characters in the space character class.)
Code:
#!/bin/ksh
# The awk on Mac OS X Version 10.7.5 does not meet POSIX/UNIX requirements for
# handing multibyte characters (it processes bytes instead of characters) at
# least in the length() and substr() functions.  This problem should be easy to
# handle in awk, but this script is written entirely as a ksh script which does
# handle multibyte characters correctly.  (The bash on OS X Version 10.7.5 also
# handles multibyte characters correctly and, although this script uses many
# features that are not defined by the standards, this script works both with
# ksh and bash on OS X.  If using this script on another system, you will need
# to use a 1993 or later version of ksh.)

# Read chinese string.
while IFS="" read -r c
do      # Read corresponding Chinese pinyin string with tone marks.
        IFS="" read -r cp <&3
        # Strip a trailing space character class character from each string, if
        # there is one.
        c=${c%[[:space:]]}
        cp=${cp%[[:space:]]}
        # Is there a tone mark at the end of the Chinese pinyin string?
        if [[ ${cp:$((${#cp} - 1))} != [[:digit:]] ]]
        then    # No.  Add "0" as a tone mark.
                cp="${cp}0"
        fi
        # Strip everything but tone marks from the Chinese pinyin string.
        cp=${cp//[![:digit:]]/}
        # Print the Chinese characters with their corresponding tone marks.
        sep=""  # No separator for first character pair.
        for ((i = 0; i < ${#cp}; i++))
        do      printf "%s%s,%s" "$sep" "${c:$i:1}" "${cp:$i:1}"
                sep="," # Separator for all following character pairs.
        done
        # Add the trailing newline.
        echo
done < Word_list_1 3< Word_list_2


Last edited by Don Cragun; 02-03-2013 at 05:50 PM.. Reason: Update with new info re: Mac OS X
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Resources or documents for shell programming

hello friends Please tell me where can I get good documentation for shell programming and examples for shell programming. Please try to help me.. with rgds, varma (2 Replies)
Discussion started by: jarkvarma
2 Replies

2. UNIX for Dummies Questions & Answers

New User needing Help for upcoming job

Hello All, I'm applying for a new job in telecommunications and have been asked to learn unix and pearl scripting. I've got a copy of knoppix Linux 03. I at this point only know how to list files, create directories, change permissions. I was instructed to learn how to create files, basic... (3 Replies)
Discussion started by: cyberjax21
3 Replies

3. Shell Programming and Scripting

arrays and needing length of fields

I have a sort of complex problem that I just can't figure out. I have data coming into a ksh program in a format that I need to parse out and display into a different format into a text file for printing. I have figured out how to get all the data in the format I need it in for the text file. The... (6 Replies)
Discussion started by: ajgwin
6 Replies

4. Shell Programming and Scripting

Beginner Shell Programming Question

Hello all, I am currently try to learn the linux operating system as well as some bash programming. I have come across some online course work which has been very helpful, I have been working through some assignments and since I have no teacher to ask I have come to you experts. So the... (6 Replies)
Discussion started by: g2axiom
6 Replies

5. UNIX for Dummies Questions & Answers

Linux noob needing help with a script

Hi, Very new to linux but I've just recently setup an ubuntu server. I have 2 broadband connections and would like to have fallback on the server should one of the lines fail. I know what I want it to do, but dont know how to script it. heres the senario; ubuntu server with 2 ethernet... (0 Replies)
Discussion started by: ziggycat
0 Replies

6. Programming

beginner to c programming

hii friends i m fairy new to c programming.can any one suggest some good websites and some good books for beginner (6 Replies)
Discussion started by: pankajchandel
6 Replies

7. UNIX for Dummies Questions & Answers

Using the Foreach loop, Needing help

I am trying to make a script for my Counter-Strike: Source servers. What i am wanting it to do is for it to restart each server, the only way i can think of doing this in through for each. Years what i have at the moment. server_start() { START=`ps x | grep SCREEN | grep $SRV | cut -d '?' -f... (5 Replies)
Discussion started by: grahamn95
5 Replies

8. Programming

Shell programming ksh AIX - beginner

Hi! I have two shell scripts - Script1, Script2 Script1, Script2 - have return parameter Script1 - is calling Script2 in Script2 I am calling program sqlldr - if this program is called then I did not get the return parameter from Script1 Do You have any idea how can I avoid this problem. Mroki (6 Replies)
Discussion started by: mroki
6 Replies

9. Shell Programming and Scripting

perl line needing a tweak

Hi Folks, I have a perl line that looks like this and it works fine as is, but I need it to expand a bid further. perl -aF, -ne 'printf "conf zone %2\$s delete host %s,,,$F\n",split/\./,$F,2 if /^hostrecord/ &&/\b10\.8\.(|1)\.\d/' hosts.csv this code the way it is does this 10.8.3.0... (10 Replies)
Discussion started by: richsark
10 Replies

10. Shell Programming and Scripting

Shell Programming (beginner help)

So guys basically I was really sick and couldn't attend the labs and lectures and I went to my lecture hoping he would say ok I will help you from the start but he just said google it. So If it's possible to make the assignment and explain more in detail why is that would be really helpfull. I can... (1 Reply)
Discussion started by: Joola94
1 Replies
All times are GMT -4. The time now is 04:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy