A beginner needing some help programming documents
Hi all,
I'm a fairly new beginner with shell programming and python programming. I have a mac (mountain lion OS 10.8.2) and use the terminal for programming. I'm trying to use the unix to easily organize some language data that I am working with. Basically I have to word lists, that I need to combine into one.
Word list 1 (Chinese):
Word List 2 (Chinese pinyin with numerical tone mark):
My desired outcome would combing the numbers from the second wordlist with the characters in the first word list to look like this:
It is important that the format is "character," comma, "number"
So far I have done the following with wordlist two:
My current output document looks like this:
It is 'almost' there - but the first and third need to be further 'integrated' so the format is 'character' comma 'number 'character' comma 'number'. So every single Chinese symbol should be followed by a number. One additional problem, is that some words (such as the second character in the first example
(你们,3,) do not have a corresponding number - in this case I would like it to automatically insert a zero '0' - so the first word would appear "你,3,们,0". So specifically - I need help:
1) formatting the document to appear "character" comma "number", "character" comma "number instead of "character" "character" comma "number" comma "number"
2) Having a zero '0' inserted after the comma when there is not already a number.
Any help or suggestions would be greatly appreciated
Last edited by Scrutinizer; 02-01-2013 at 06:33 AM..
Reason: Please use code tags for data and code samples
This is the first time I have to struggle with UTF>8 chars, so I'm feeling a bit overstrained, and you should take my proposal as a mere direction indicator. On top, both your input files have trailing blanks that I removed. If they are needed, you have to insert special action into the code. Here's my meek approach:
The trailing commas are due to the insufficient attempt to separate chinese syllables which I didn't bother to remove - I'm sure you have better means in your locale!
This is the first time I have to struggle with UTF>8 chars, so I'm feeling a bit overstrained, and you should take my proposal as a mere direction indicator. On top, both your input files have trailing blanks that I removed. If they are needed, you have to insert special action into the code. Here's my meek approach:
The trailing commas are due to the insufficient attempt to separate chinese syllables which I didn't bother to remove - I'm sure you have better means in your locale!
Nomadblue,
RudiC's code looks reasonable, but I haven't been able to test it. I have found that awk on OS X Version 10.7.5 (Lion) counts bytes instead of counting characters when using substr() and length() and that using a regular expression to search for a space fails if the space follows a multibyte character (not just in awk; but also at least in bash, ed, ex, grep, ksh, sed, and vi). My testing was done with LANG set to en_US.UTF-8 and no LC_* environment variables set.
I would love to hear if this has been fixed in Mountain Lion.
************************
Update: I take back what I said about REs not matching spaces after multibyte characters. The characters that I originally thought were spaces were multibyte characters consisting of the octal byte sequences: 0343 0200 0200 and 0342 0200 0206. Those two characters aren't spaces, but they are in the locale's space character class.
---------- Post updated Feb 3rd, 2013 at 13:46 ---------- Previous update was Feb 2nd, 2013 at 23:13 ----------
The following script seems to do what you want except that it does not print any trailing space character class characters at the ends of the output lines. (Note that Word list 1 had a trailing character in the space character class on lines 3 and 5, Word list 2 on lines 2 and 3, and your desired outcome on lines 2 and 3. The output produced by this script does not include any characters in the space character class.)
Last edited by Don Cragun; 02-03-2013 at 05:50 PM..
Reason: Update with new info re: Mac OS X
So guys basically I was really sick and couldn't attend the labs and lectures and I went to my lecture hoping he would say ok I will help you from the start but he just said google it. So If it's possible to make the assignment and explain more in detail why is that would be really helpfull.
I can... (1 Reply)
Hi Folks,
I have a perl line that looks like this and it works fine as is, but I need it to expand a bid further.
perl -aF, -ne 'printf "conf zone %2\$s delete host %s,,,$F\n",split/\./,$F,2 if /^hostrecord/ &&/\b10\.8\.(|1)\.\d/' hosts.csv
this code the way it is does this
10.8.3.0... (10 Replies)
Hi!
I have two shell scripts - Script1, Script2
Script1, Script2 - have return parameter
Script1 - is calling Script2
in Script2 I am calling program sqlldr - if this program is called then I did not get the return parameter from Script1
Do You have any idea how can I avoid this problem.
Mroki (6 Replies)
I am trying to make a script for my Counter-Strike: Source servers. What i am wanting it to do is for it to restart each server, the only way i can think of doing this in through for each.
Years what i have at the moment.
server_start() {
START=`ps x | grep SCREEN | grep $SRV | cut -d '?' -f... (5 Replies)
Hi, Very new to linux but I've just recently setup an ubuntu server.
I have 2 broadband connections and would like to have fallback on the server should one of the lines fail.
I know what I want it to do, but dont know how to script it.
heres the senario;
ubuntu server with 2 ethernet... (0 Replies)
Hello all,
I am currently try to learn the linux operating system as well as some bash programming. I have come across some online course work which has been very helpful, I have been working through some assignments and since I have no teacher to ask I have come to you experts.
So the... (6 Replies)
I have a sort of complex problem that I just can't figure out. I have data coming into a ksh program in a format that I need to parse out and display into a different format into a text file for printing. I have figured out how to get all the data in the format I need it in for the text file. The... (6 Replies)
Hello All,
I'm applying for a new job in telecommunications and have been asked to learn unix and pearl scripting. I've got a copy of knoppix Linux 03. I at this point only know how to list files, create directories, change permissions. I was instructed to learn how to create files, basic... (3 Replies)
hello friends
Please tell me where can I get good documentation for shell programming and examples for shell programming. Please try to help me..
with rgds,
varma (2 Replies)