Regexes for three column data to create a dictionary
I am working on a multilingual dictionary and I have data in three columns. The data structure can be
Code:
word=word=gloss
or
Code:
word word=word word=gloss gloss
Code:
=
acts as a delimiter
The number of words separated by the delimiter can be up to 8 or 10. The structure is well defined in the sense that the number of words in the first column and the number of words in the second column are identical
An example will make this clear. For ease of comprehension I am using Latin script:
Code:
book=boook=bUk
Code:
hand book=hannd boook=hEnD bUk
and so on.
I need to map the gloss in column3 to the string in column1 and the string in column 2
Code:
book=bUK
boook=bUK
Code:
hand book=hEnD bUk
hannd boook=hEnD bUk
My query is how do I write a regex which will identify each of these types. Once I have the regex, I can write a script which will easily separate these out. I would appreciate a regex in Perl or Unix.
A script in either Perl or Awk would be the cherry on the cake. I work in a Windows environment
I hope to complete the mapper and put it up as a useful tool for multi-lingual transliteration across two languages. Many thanks.
Hello,
I am having a problem with the script I am using to create a column from two columns I have in my file. I am needing to take column 5 and subtract it from column 2 to create column 6. I have included the script I am using and the rawdata I am using.
Raw Data File:... (4 Replies)
i am having a ms excel file which contains 2 columns, I first column i added words, and in second column meaning to the word in the first column. I want to create a dictionary for mobile like nokia or any java based application running mobile. How it can be created as i, dont know the java... (1 Reply)
I have a data file that has a list of data macthing by user.
I am able to sort by user and there is multiple rows for each user.
Ideally I would like to email only the user of the files they own. Would it be best to create a seperate file by user and all rows showing the files they own? (9 Replies)
1. The problem statement, all variables and given/known data:
Create a dictionary using cygwin. Display the following menu at the start of
execution
1-add a word in the dictionary # specify the meaning
2-search a word # if word exists, show the meaning of the word
2-delete a word... (2 Replies)
Trying to understand what's happening here, but I cannot figure it out.
I'm reading Mastering Regular Expressions, by Friedl, and he uses this as an example of how to grab quoted text:
egrep -o '"*"' ~/File.txt
...should pull in any quoted phrases. Match a literal double-quote, match anything... (11 Replies)
Hi,
I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column.
clientname,day-of-month,max-users
-----------------------------------
client1,20120610,5
client2,20120610,2
client3,20120610,7... (3 Replies)
Hello experts,
Please help me in achieving this in an easier way possible. I have 2 csv files with following data:
File1
08/23/2012 12:35:47,JOB_5330
08/23/2012 12:35:47,JOB_5330
08/23/2012 12:36:09,JOB_5340
08/23/2012 12:36:14,JOB_5340
08/23/2012 12:36:22,JOB_5350
08/23/2012... (5 Replies)
I have a text file in UTF-8 format which has the following data structure
HEADWORD=gloss1,gloss2,gloss3 etc
I want to convert it so that all the glosses of the HeadWord appear on separate lines
HEADWORD=gloss1
HEADWORD=gloss2
HEADWORD=gloss3
An example will illustrate the requirement... (4 Replies)
Hi, I am newbie in shell script.
I need your help to solve my problem.
Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file.
File1:
SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies
LEARN ABOUT SUNOS
wnnatod
wnnatod(1) User Commands wnnatod(1)NAME
wnnatod - Convert an EUC text dictionary to a binary dictionary
SYNOPSIS
/usr/bin/wnnatod [-s num] [-R] [-S] [-U] [-r] [-N] [-n] [-P filename] [-p filename] [-I] [-e] [-h filename] binary_dictionary_filename
DESCRIPTION
wnnatod reads a Japanese EUC text dictionary from the standard input, converts it to a binary dictionary and writes it to the specified
binary_dictionary_filename.
OPTIONS
The following options are available.
-s num Specifies the amount of memory to allocate (in words). num should be a little over the number of words in the dictionary.
Normally you do not need to specify this option. The default is 70,000. If wnnatod fails, notifying memory shortage, retry
the command with -s option.
-R Converts the EUC text dictionary to a reverse-searchable binary dictionary (default).
-S Converts the EUC text dictionary to a fixed-format dictionary.
-U Converts the EUC text dictionary to an editable dictionary.
-r Reverses the order of Kana and Kanji when converting the EUC text dictionary.
-N Sets the dictionary password to "*".
-n Sets the frequency password to "*".
-P filename Specifies the file name of the dictionary password.
-p filename Specifies the file name of the frequency password.
-I Creates a system dictionary.
-e Registers an entry's reading (Hiragana) as word in the binary dictionary if the reading and the word are the same (that is,
the word consists of only Hiragana). With this option, you cannot convert a text dictionary to a reverse-searchable
binary dictionary.
-h filename Specifies the file name that contains part of speech information.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|Availability |SUNWjwncu |
+-----------------------------+-----------------------------+
SEE ALSO wnndictutil(1), wnndtoa(1), wnnotow(1), wnntouch(1)SunOS 5.10 2 Mar 1998 wnnatod(1)