Remove inverted duplicates from a mapping database


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove inverted duplicates from a mapping database
# 1  
Old 12-19-2013
Remove inverted duplicates from a mapping database

My excuses for a Title which does not really describe what I need.
My OS is Windows Vista/Windows7
I have a large database of homographs with the following structure:
Code:
name=name variant

i.e. a variant of a name is provided on a line separated by a
Code:
=

.
An example will make this clear
Code:
John=Johann
John=Jean

Since the database has been manually prepared, it often happens that duplicates have been created where the left hand side variant and right hand side variant are inverted as in the example below:
Code:
Johann=John
Jean=John

This has created a database which is bloated because of these "inverted dupes".
What I need is a PERL or AWK script which will remove these dupes and keep only one set.
Example of Input and Output
Input:
Code:
John=Johann
John=Jean
Johann=John
Jean=John

Expected output after removal of dupes
Code:
John=Johann
John=Jean

Many thanks in anticipation for your help.
# 2  
Old 12-19-2013
With 180 posts to the 59 threads you have started in these forums, we are starting to feel that rather than trying to learn how to use the UNIX-like tools available on Windows systems, you just want us to do your work for you.

What have you tried so far to solve this problem?

What isn't working with what you have tried?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001... (1 Reply)
Discussion started by: tejashavele
1 Replies

2. Shell Programming and Scripting

Remove top 3 duplicates

hello , I have a requirement with input in below format abc 123 xyz bcd 365 kii abc 987 876 cdf 987 uii abc 456 yuu bcd 654 rrr Expecting Output abc 456 yuu bcd 654 rrr cdf 987 uii (1 Reply)
Discussion started by: Tomlight
1 Replies

3. Shell Programming and Scripting

Remove duplicates

I have a file with the following format: fields seperated by "|" title1|something class|long...content1|keys title2|somhing class|log...content1|kes title1|sothing class|lon...content1|kes title3|shing cls|log...content1|ks I want to remove all duplicates with the same "title field"(the... (3 Replies)
Discussion started by: dtdt
3 Replies

4. Shell Programming and Scripting

Help with merge and remove duplicates

Hi all, I need some help to remove duplicates from a file before merging. I have got 2 files: file1 has data in format 4300 23456 4301 2357 the 4 byte values on the right hand side is uniq, and are not repeated anywhere in the file file 2 has data in same format but is not in... (10 Replies)
Discussion started by: roy121
10 Replies

5. Shell Programming and Scripting

Awk: Remove Duplicates

I have the following code for removing duplicate records based on fields in inputfile file & moves the duplicate records in duplicates file(1st Awk) & in 2nd awk i fetch the non duplicate entries in inputfile to tmp file and use move to update the original file. Requirement: Can both the awk... (4 Replies)
Discussion started by: siramitsharma
4 Replies

6. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ... (2 Replies)
Discussion started by: locoroco
2 Replies

7. Shell Programming and Scripting

Script to remove duplicates

Hi I need a script that removes the duplicate records and write it to a new file for example I have a file named test.txt and it looks like abcd.23 abcd.24 abcd.25 qwer.25 qwer.26 qwer.98 I want to pick only $1 and compare with the next record and the output should be abcd.23... (6 Replies)
Discussion started by: antointoronto
6 Replies

8. Shell Programming and Scripting

Remove duplicates

Hello Experts, I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra... (4 Replies)
Discussion started by: forumthreads
4 Replies

9. UNIX for Dummies Questions & Answers

How to remove duplicates without sorting

Hello, I can remove duplicate entries in a file by: sort File1 | uniq > File2 but how can I remove duplicates without sorting the file? I tried cat File1 | uniq > File2 but it doesn't work thanks (4 Replies)
Discussion started by: orahi001
4 Replies

10. Shell Programming and Scripting

difference between double inverted coma and single inverted comma

Whats the basic difference between double inverted comma and single inverted comma and no comma applied at all? Eg1 if Eg2 if iEg3 f (1 Reply)
Discussion started by: abhisekh_ban
1 Replies
Login or Register to Ask a Question