Regexes for three column data to create a dictionary Post: 302971638

Sponsored Content

Top Forums Shell Programming and Scripting Regexes for three column data to create a dictionary Post 302971638 by gimley on Friday 22nd of April 2016 11:04:06 PM

04-23-2016

Registered User

Regexes for three column data to create a dictionary

I am working on a multilingual dictionary and I have data in three columns. The data structure can be

Code:

word=word=gloss

Code:

word word=word word=gloss gloss

Code:

acts as a delimiter
The number of words separated by the delimiter can be up to 8 or 10. The structure is well defined in the sense that the number of words in the first column and the number of words in the second column are identical
An example will make this clear. For ease of comprehension I am using Latin script:

Code:

book=boook=bUk

Code:

hand book=hannd boook=hEnD bUk

and so on.
I need to map the gloss in column3 to the string in column1 and the string in column 2

Code:

book=bUK
boook=bUK

Code:

hand book=hEnD bUk
hannd boook=hEnD bUk

My query is how do I write a regex which will identify each of these types. Once I have the regex, I can write a script which will easily separate these out. I would appreciate a regex in Perl or Unix.
A script in either Perl or Awk would be the cherry on the cake. I work in a Windows environment
I hope to complete the mapper and put it up as a useful tool for multi-lingual transliteration across two languages. Many thanks.

gimley

View Public Profile for gimley

Find all posts by gimley

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Question about working with data to create new column

Hello, I am having a problem with the script I am using to create a column from two columns I have in my file. I am needing to take column 5 and subtract it from column 2 to create column 6. I have included the script I am using and the rawdata I am using. Raw Data File:...

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18...

3. Programming

How to create java based dictionary for mobile using data in microsoft excel?

i am having a ms excel file which contains 2 columns, I first column i added words, and in second column meaning to the word in the first column. I want to create a dictionary for mobile like nokia or any java based application running mobile. How it can be created as i, dont know the java...

4. Shell Programming and Scripting

create a new file from data file from a column

I have a data file that has a list of data macthing by user. I am able to sort by user and there is multiple rows for each user. Ideally I would like to email only the user of the files they own. Would it be best to create a seperate file by user and all rows showing the files they own?

5. Homework & Coursework Questions

How to create a dictionary using cygwin

1. The problem statement, all variables and given/known data: Create a dictionary using cygwin. Display the following menu at the start of execution 1-add a word in the dictionary # specify the meaning 2-search a word # if word exists, show the meaning of the word 2-delete a word...

6. UNIX for Dummies Questions & Answers

What's the Diff Between These Two Regexes?

Trying to understand what's happening here, but I cannot figure it out. I'm reading Mastering Regular Expressions, by Friedl, and he uses this as an example of how to grab quoted text: egrep -o '"*"' ~/File.txt ...should pull in any quoted phrases. Match a literal double-quote, match anything...

7. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7...

8. Shell Programming and Scripting

Compare 2 files and match column data and align data from 3 column

Hello experts, Please help me in achieving this in an easier way possible. I have 2 csv files with following data: File1 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:36:09,JOB_5340 08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350 08/23/2012...

9. Shell Programming and Scripting

Script to create unique look-up for headers for a Dictionary

I have a text file in UTF-8 format which has the following data structure HEADWORD=gloss1,gloss2,gloss3 etc I want to convert it so that all the glosses of the HeadWord appear on separate lines HEADWORD=gloss1 HEADWORD=gloss2 HEADWORD=gloss3 An example will illustrate the requirement...

10. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07...

LEARN ABOUT DEBIAN

wtoc

WTOC(1) 						      General Commands Manual							   WTOC(1)

NAME

       wtoc - Convert a Wnn text-form dictionary (or dictionaries) into Canna text-form dictionaries

SYNOPSIS

       wtoc [-f hinshidata] [wnnjisho] [cannajisho]

DESCRIPTION

       wtoc  converts  a Wnn text-form dictionary file into Canna text-form dictionary file.  If all dictionary files are omitted, the Wnn dictio-
       nary data is input through the standard input.  In this case, the dictionary of the Japanes Input System is output from the  standard  out-
       put.   If one dictionary file is specified, it will be regarded as a Wnn dictionary.  At this time, Canna dictionary output to the standard
       output.

OPTIONS

       -f hinshidata
		   The user can add new information about word-type correspondence between Wnn and Canna.  The following word-type  correspondence
		   information must be described in the hinshidata file.  Within one line, describe the Wnn word type name and the Canna word type
		   while delimiting by a space(s) or tab.

		     Wnn word type	 Canna word type
		     Adverb		 #F04

EXAMPLE

       % wtoc -f tsuikahinshi kihon.u kihon.t

       Inputs word-type correspondence information from tsuikahinshi, then converts Wnn text-form dictionary kihon.u into the Canna text-form dic-
       tionary before output.

       % wtoc special.u | lpr

       Converts Wnn text-form dictionary special.u into Canna text-form dictionary, then outputs the result to the line printer.

SEE ALSO

       ctow(1)

																	   WTOC(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Question about working with data to create new column

Discussion started by: scottzx7rr

2. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

Discussion started by: patrick87

3. Programming

How to create java based dictionary for mobile using data in microsoft excel?

Discussion started by: Anna Hussie

4. Shell Programming and Scripting

create a new file from data file from a column

Discussion started by: mykey242

5. Homework & Coursework Questions

How to create a dictionary using cygwin

Discussion started by: kpopfreakghecky

6. UNIX for Dummies Questions & Answers

What's the Diff Between These Two Regexes?

Discussion started by: sudon't

7. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

8. Shell Programming and Scripting

Compare 2 files and match column data and align data from 3 column

Discussion started by: asnandhakumar

9. Shell Programming and Scripting

Script to create unique look-up for headers for a Dictionary

Discussion started by: gimley

10. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Discussion started by: refrain

LEARN ABOUT DEBIAN

wtoc