Sponsored Content
Top Forums Shell Programming and Scripting Script to create unique look-up for headers for a Dictionary Post 302796071 by gimley on Friday 19th of April 2013 12:38:02 AM
Old 04-19-2013
Script to create unique look-up for headers for a Dictionary

I have a text file in UTF-8 format which has the following data structure
Code:
HEADWORD=gloss1,gloss2,gloss3 etc

I want to convert it so that all the glosses of the HeadWord appear on separate lines
Code:
HEADWORD=gloss1
HEADWORD=gloss2
HEADWORD=gloss3

An example will illustrate the requirement
INPUT
Code:
હોશમાં આવવું=regain consciousness.
હોશિયાર=clever, intelligent; skilful; alert, vigilant; cautious; understanding, sensible.
હોશિયારી કરવી=boast,(try to) be clever.
હોશિયારી દાખવવી=boast,(try to) be clever.
હોશિયારી બતાવવી=boast,(try to) be clever.
હોશિયારી મારવી=boast,(try to) be clever.
હોશિયારી રાખવી=be cautious,be vigilant,be alert.
હોશિયારી=cleverness, vigilance
હોહા=noise, uproar, tumult, public talk or discussion, excitement, agitation, alarm, consternation.
હોહાકાર=uproar, tumult, excitement, alarm.
હોહો=noise, uproar, tumult, public talk or discussion, excitement, agitation, alarm, consternation.

The Output would be
Code:
હોશિયાર=clever
હોશિયાર=intelligent
હોશિયાર=skilful
હોશિયાર=alert
હોશિયાર=vigilant
હોશિયાર=cautious
હોશિયાર=understanding
હોશિયાર=sensible.
હોશિયારી કરવી=boast
હોશિયારી કરવી=(try to) be clever.
હોશિયારી દાખવવી=boast
હોશિયારી દાખવવી=(try to) be clever.
હોશિયારી બતાવવી=boast
હોશિયારી બતાવવી=(try to) be clever.
હોશિયારી મારવી=boast
હોશિયારી મારવી=(try to) be clever.
હોશિયારી રાખવી=be cautious
હોશિયારી રાખવી=vigilant or alert.
હોશિયારી=cleverness
હોશિયારી=vigilance
હોશિયારી=etc.
હોહા=noise
હોહા=uproar
હોહા=tumult
હોહા=public talk or discussion
હોહા=excitement
હોહા=agitation
હોહા=alarm
હોહા=consternation.
હોહાકાર=uproar
હોહાકાર=tumult
હોહાકાર=excitement
હોહાકાર=alarm
હોહો=noise
હોહો=uproar
હોહો=tumult
હોહો=public talk or discussion
હોહો=excitement
હોહો=agitation
હોહો=alarm
હોહો=consternation

At present I use macros which identify the delimiter, copy the text between two delimiters, paste it on next line, preface it with the headword and continue the operation till end of line and repeat the same for the next line. Since the file is huge a PERL or AWK script would help.
I work under Windows and UNIX type solutions do not work for me unfortunately.
Many thanks in advance.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Compare unique ID's to Create and Delete Times

I have thousands of lines a day of data I would like to sort out. Every sessions has the 3 lines below. I want to figure out each sessions length from Creation to Deletion. Every one has a unique session ID logevent3:<190>Nov 20 08:41:06 000423df255c: 6|4096|RC|CAC: Created CAC session ID... (2 Replies)
Discussion started by: bpfoster7
2 Replies

2. Shell Programming and Scripting

Remove text between headers while leaving headers intact

Hi, I'm trying to strip all lines between two headers in a file: ### BEGIN ### Text to remove, contains all kinds of characters ... Antispyware-Downloadserver.com (Germany)=http://www.antispyware-downloadserver.c om/updates/ Antispyware-Downloadserver.com #2... (3 Replies)
Discussion started by: Trones
3 Replies

3. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

4. Programming

How to create java based dictionary for mobile using data in microsoft excel?

i am having a ms excel file which contains 2 columns, I first column i added words, and in second column meaning to the word in the first column. I want to create a dictionary for mobile like nokia or any java based application running mobile. How it can be created as i, dont know the java... (1 Reply)
Discussion started by: Anna Hussie
1 Replies

5. Shell Programming and Scripting

Create unique tar archives from a list of directories

I'm looking to archive a client directory from a CIFS share There are multiple directories that will be stored in a text file and I'm looking to create an individual tar archive of each folder in the directory. I've tried a number of commands to no avail. Here's what I would like. ... (2 Replies)
Discussion started by: Steelysteel
2 Replies

6. Homework & Coursework Questions

How to create a dictionary using cygwin

1. The problem statement, all variables and given/known data: Create a dictionary using cygwin. Display the following menu at the start of execution 1-add a word in the dictionary # specify the meaning 2-search a word # if word exists, show the meaning of the word 2-delete a word... (2 Replies)
Discussion started by: kpopfreakghecky
2 Replies

7. Shell Programming and Scripting

Create shell script to extract unique information from one file to a new file.

Hi to all, I got this content/pattern from file http.log.20110808.gz mail1 httpd: Account Notice: close igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1 mail1 httpd: Account Information: login sastria9@abc.com proxy sid=gFp4DLm5HnU mail1 httpd: Account Notice: close sastria9@abc.com... (16 Replies)
Discussion started by: Mr_47
16 Replies

8. Shell Programming and Scripting

Regexes for three column data to create a dictionary

I am working on a multilingual dictionary and I have data in three columns. The data structure can be word=word=gloss or word word=word word=gloss gloss = acts as a delimiter The number of words separated by the delimiter can be up to 8 or 10. The structure is well defined in the sense... (6 Replies)
Discussion started by: gimley
6 Replies

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

10. Shell Programming and Scripting

awk xml dictionary script: could I get some input?

I completely understand if nobody wants to take a look at the ENTIRE code. What I am asking is that if anyone could browse quickly over the code and perhaps see if anything could be improved. You need not run the program, but you can if you want to. I have been using awk for about a week or so,... (2 Replies)
Discussion started by: bedtime
2 Replies
All times are GMT -4. The time now is 07:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy