Sponsored Content
Top Forums Shell Programming and Scripting Merging dupes on different lines in a dictionary Post 302698263 by gimley on Sunday 9th of September 2012 11:39:39 AM
Old 09-09-2012
Merging dupes on different lines in a dictionary

I am working on a homonym dictionary of names i.e. names which are clustered together according to their “sound-alike” pronunciation:
An example will make this clear:
Quote:
ameer=aamir=aameer=amir
Since the dictionary is manually constructed it often happens that inadvertently two sets of “homonyms” which should be grouped together are grouped separately. Thus:
Quote:
bisnnu=vishnu
vishno=wishnu=vishnoo=vaishnu=vishnu=visnu=veeshnu=vishanu=vishnau
“vishnu” is shared in both the first set and the second and actually both sets should be reduced to one:
Quote:
bisnnu=vishnu=vishno=wishnu=vishnoo=vaishnu=vishnu=visnu=veeshnu=vishanu=vishnau
I have written a program which points out such “dupes” and also the line on which they occur in the database. But since I am a newbie in Perl try as I might, I cannot write a perl program which will safely merge both sets where there are dupes. I have a script in Ultraedit format which does the job, but it is dreadfully slow and takes too much time.

I am giving below a sample of such dupes:
Quote:
yasmine=yashmeen=yasamin=yasameen=yaasmin=yashameen=yasmeen=yasmin
yashmeen=yazmeen=yasmeen=yasmin=yashmin
watson=vatson=wattson
watson
tekchand=teckchand=tekchanda=tekachnda=tekachnd=teckchanda
tekchand=tekachand
sailesh
shailaesh=shailesh=sailesh
The expected output should be
Quote:
yasmine=yashmeen=yasamin=yasameen=yaasmin=yashameen=yasmeen=yasmin=yashmeen=yazmeen=yasmeen=yasmin=y ashmin
watson=vatson=wattson=watson
tekchand=teckchand=tekchanda=tekachnda=tekachnd=teckchanda=tekchand=tekachand
sailesh=shailaesh=shailesh=sailesh
Ideally the program should also weed out duplicates in a given row but I have an awk program that does the job efficently.

Any help would be really great. Many thanks in advance for a PERL or AWK script. I work under windows and hence sed will not help.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merging lines into one

Hello. I would be very pleased if sb. help me to solve my problem. I've got a file with many non blank lines and I want to merge all lines into one not destroy the informations on them. I've tryed it with split and paste, tr, sed , but everything I've done has been wrong. I know about crazy... (8 Replies)
Discussion started by: Foxgard
8 Replies

2. Shell Programming and Scripting

Merging lines using AWK

Hi, Anybody help on this. :( I want to merge the line with previous line, if the line starts with 7. Otherwise No change in the line. Example file aa.txt is like below 122122 222222 333333 734834 702923 389898 790909 712345 999999 My output should be written in another file... (6 Replies)
Discussion started by: senthil_is
6 Replies

3. Shell Programming and Scripting

Conditional merging of lines

I have a large file where some lines have been split into two lines; some of them even with white spaces before the second line. e.g in the following text I want to merge only specific lines ( say UNIX is cool), also removing white spaces only between them, others shall remain same on the output.... (4 Replies)
Discussion started by: sunny23
4 Replies

4. Shell Programming and Scripting

Merging lines

Hi folks. Could somebody help me write a script or command that will look through a file and for every line that doesn't contain a certain value, merge it with the one above? For example, the file contains: SCOTLAND|123|ABC|yes SCOTLAND|456|DEF|yes SCOTLAND|78 9|GHI|yes ... (3 Replies)
Discussion started by: MDM
3 Replies

5. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?... (6 Replies)
Discussion started by: ink_LE
6 Replies

6. Shell Programming and Scripting

Merging lines

Thanks it worked for me. I have one more question on top of that. We had few records which were splitted in 2 lines instead of one. Now i identified those lines. The file is too big to open via vi and edit it. How can i do it without opening the file. Suppose, I want line number 1001 & 1002 to... (2 Replies)
Discussion started by: Gangadhar Reddy
2 Replies

7. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of... (6 Replies)
Discussion started by: gimley
6 Replies

8. Shell Programming and Scripting

Merging 2 lines together

I have a small problem, which due to my lack of knowledge, has left me unable to decipher some of the solutions that I looked at on these forums. So below is a piece of text, which I ran via cat -vet, which comes from within a program file. I have many such programs to process and repeatable,... (4 Replies)
Discussion started by: skarnm
4 Replies

9. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

10. UNIX for Beginners Questions & Answers

Merging two lines into one (awk)

Hi, I am attempting to merge the following lines which run over two lines using awk. INITIAL OUTPUT 2019 Sep 28 10:47:24.695 hkaet9612 last message repeated 1 time 2019 Sep 28 10:47:24.695 hkaet9612 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interfa ce Ethernet1/45 is down (Interface removed)... (10 Replies)
Discussion started by: sand1234
10 Replies
SHELL-QUOTE(1p) 					User Contributed Perl Documentation					   SHELL-QUOTE(1p)

NAME
shell-quote - quote arguments for safe use, unmodified in a shell command SYNOPSIS
shell-quote [switch]... arg... DESCRIPTION
shell-quote lets you pass arbitrary strings through the shell so that they won't be changed by the shell. This lets you process commands or files with embedded white space or shell globbing characters safely. Here are a few examples. EXAMPLES
ssh preserving args When running a remote command with ssh, ssh doesn't preserve the separate arguments it receives. It just joins them with spaces and passes them to "$SHELL -c". This doesn't work as intended: ssh host touch 'hi there' # fails It creates 2 files, hi and there. Instead, do this: cmd=`shell-quote touch 'hi there'` ssh host "$cmd" This gives you just 1 file, hi there. process find output It's not ordinarily possible to process an arbitrary list of files output by find with a shell script. Anything you put in $IFS to split up the output could legitimately be in a file's name. Here's how you can do it using shell-quote: eval set -- `find -type f -print0 | xargs -0 shell-quote --` debug shell scripts shell-quote is better than echo for debugging shell scripts. debug() { [ -z "$debug" ] || shell-quote "debug:" "$@" } With echo you can't tell the difference between "debug 'foo bar'" and "debug foo bar", but with shell-quote you can. save a command for later shell-quote can be used to build up a shell command to run later. Say you want the user to be able to give you switches for a command you're going to run. If you don't want the switches to be re-evaluated by the shell (which is usually a good idea, else there are things the user can't pass through), you can do something like this: user_switches= while [ $# != 0 ] do case x$1 in x--pass-through) [ $# -gt 1 ] || die "need an argument for $1" user_switches="$user_switches "`shell-quote -- "$2"` shift;; # process other switches esac shift done # later eval "shell-quote some-command $user_switches my args" OPTIONS
--debug Turn debugging on. --help Show the usage message and die. --version Show the version number and exit. AVAILABILITY
The code is licensed under the GNU GPL. Check http://www.argon.org/~roderick/ or CPAN for updated versions. AUTHOR
Roderick Schertler <roderick@argon.org> perl v5.8.4 2005-05-03 SHELL-QUOTE(1p)
All times are GMT -4. The time now is 01:32 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy