Sponsored Content
Top Forums Shell Programming and Scripting Merging dupes on different lines in a dictionary Post 302698263 by gimley on Sunday 9th of September 2012 11:39:39 AM
Old 09-09-2012
Merging dupes on different lines in a dictionary

I am working on a homonym dictionary of names i.e. names which are clustered together according to their “sound-alike” pronunciation:
An example will make this clear:
Quote:
ameer=aamir=aameer=amir
Since the dictionary is manually constructed it often happens that inadvertently two sets of “homonyms” which should be grouped together are grouped separately. Thus:
Quote:
bisnnu=vishnu
vishno=wishnu=vishnoo=vaishnu=vishnu=visnu=veeshnu=vishanu=vishnau
“vishnu” is shared in both the first set and the second and actually both sets should be reduced to one:
Quote:
bisnnu=vishnu=vishno=wishnu=vishnoo=vaishnu=vishnu=visnu=veeshnu=vishanu=vishnau
I have written a program which points out such “dupes” and also the line on which they occur in the database. But since I am a newbie in Perl try as I might, I cannot write a perl program which will safely merge both sets where there are dupes. I have a script in Ultraedit format which does the job, but it is dreadfully slow and takes too much time.

I am giving below a sample of such dupes:
Quote:
yasmine=yashmeen=yasamin=yasameen=yaasmin=yashameen=yasmeen=yasmin
yashmeen=yazmeen=yasmeen=yasmin=yashmin
watson=vatson=wattson
watson
tekchand=teckchand=tekchanda=tekachnda=tekachnd=teckchanda
tekchand=tekachand
sailesh
shailaesh=shailesh=sailesh
The expected output should be
Quote:
yasmine=yashmeen=yasamin=yasameen=yaasmin=yashameen=yasmeen=yasmin=yashmeen=yazmeen=yasmeen=yasmin=y ashmin
watson=vatson=wattson=watson
tekchand=teckchand=tekchanda=tekachnda=tekachnd=teckchanda=tekchand=tekachand
sailesh=shailaesh=shailesh=sailesh
Ideally the program should also weed out duplicates in a given row but I have an awk program that does the job efficently.

Any help would be really great. Many thanks in advance for a PERL or AWK script. I work under windows and hence sed will not help.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merging lines into one

Hello. I would be very pleased if sb. help me to solve my problem. I've got a file with many non blank lines and I want to merge all lines into one not destroy the informations on them. I've tryed it with split and paste, tr, sed , but everything I've done has been wrong. I know about crazy... (8 Replies)
Discussion started by: Foxgard
8 Replies

2. Shell Programming and Scripting

Merging lines using AWK

Hi, Anybody help on this. :( I want to merge the line with previous line, if the line starts with 7. Otherwise No change in the line. Example file aa.txt is like below 122122 222222 333333 734834 702923 389898 790909 712345 999999 My output should be written in another file... (6 Replies)
Discussion started by: senthil_is
6 Replies

3. Shell Programming and Scripting

Conditional merging of lines

I have a large file where some lines have been split into two lines; some of them even with white spaces before the second line. e.g in the following text I want to merge only specific lines ( say UNIX is cool), also removing white spaces only between them, others shall remain same on the output.... (4 Replies)
Discussion started by: sunny23
4 Replies

4. Shell Programming and Scripting

Merging lines

Hi folks. Could somebody help me write a script or command that will look through a file and for every line that doesn't contain a certain value, merge it with the one above? For example, the file contains: SCOTLAND|123|ABC|yes SCOTLAND|456|DEF|yes SCOTLAND|78 9|GHI|yes ... (3 Replies)
Discussion started by: MDM
3 Replies

5. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?... (6 Replies)
Discussion started by: ink_LE
6 Replies

6. Shell Programming and Scripting

Merging lines

Thanks it worked for me. I have one more question on top of that. We had few records which were splitted in 2 lines instead of one. Now i identified those lines. The file is too big to open via vi and edit it. How can i do it without opening the file. Suppose, I want line number 1001 & 1002 to... (2 Replies)
Discussion started by: Gangadhar Reddy
2 Replies

7. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of... (6 Replies)
Discussion started by: gimley
6 Replies

8. Shell Programming and Scripting

Merging 2 lines together

I have a small problem, which due to my lack of knowledge, has left me unable to decipher some of the solutions that I looked at on these forums. So below is a piece of text, which I ran via cat -vet, which comes from within a program file. I have many such programs to process and repeatable,... (4 Replies)
Discussion started by: skarnm
4 Replies

9. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

10. UNIX for Beginners Questions & Answers

Merging two lines into one (awk)

Hi, I am attempting to merge the following lines which run over two lines using awk. INITIAL OUTPUT 2019 Sep 28 10:47:24.695 hkaet9612 last message repeated 1 time 2019 Sep 28 10:47:24.695 hkaet9612 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interfa ce Ethernet1/45 is down (Interface removed)... (10 Replies)
Discussion started by: sand1234
10 Replies
ppmtopgm(1)						      General Commands Manual						       ppmtopgm(1)

NAME
ppmtopgm - convert a portable pixmap into a portable graymap SYNOPSIS
ppmtopgm [ppmfile] DESCRIPTION
Reads a portable pixmap as input. Produces a portable graymap as output. The output is a "black and white" rendering of the original image, as in a black and white photograph. The quantization formula used is .299 r + .587 g + .114 b. Note that although there is a pgmtoppm program, it is not necessary for simple conversions from pgm to ppm , because any ppm program can read pgm (and pbm ) files automatically. pgmtoppm is for colorizing a pgm file. Also, see ppmtorgb3 for a different way of converting color to gray. And ppmdist generates a grayscale image from a color image, but in a way that makes it easy to differentiate the original colors, not necessarily a way that looks like a black and white photograph. QUOTE
Cold-hearted orb that rules the night Removes the colors from our sight Red is gray, and yellow white But we decide which is right And which is a quantization error. SEE ALSO
pgmtoppm(1),ppmtorgb3(1),rgb3toppm(1),ppmdist(1),ppm(5),pgm(5) AUTHOR
Copyright (C) 1989 by Jef Poskanzer. 10 April 2000 ppmtopgm(1)
All times are GMT -4. The time now is 11:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy