Remove all instances of duplicate records from the file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove all instances of duplicate records from the file
# 1  
Old 12-11-2007
Remove all instances of duplicate records from the file

Hi experts,
I am new to scripting. I have a requirement as below.

File1:

A|123|NAME1
A|123|NAME2
B|123|NAME3

File2:

C|123|NAME4
C|123|NAME5
D|123|NAME6

1) I have 2 merge both the files.
2) need to do a sort ( key fields are first and second field)
3) remove all the instances of duplicate records from the merged file and write write all these duplicate instances into one file.
4) rest of the records which are unique in the original source files, needs to be written into another file

outfiles:

file3:
A|123|NAME1
A|123|NAME2
C|123|NAME4
C|123|NAME5

File4:

B|123|NAME3
D|123|NAME6

Please help me with the solution as I am in real urgent. Appreciate your help.

Thank you
# 2  
Old 12-12-2007
Quote:
Originally Posted by vukkusila
Hi experts,
I am new to scripting. I have a requirement as below.

File1:

A|123|NAME1
A|123|NAME2
B|123|NAME3

File2:

C|123|NAME4
C|123|NAME5
D|123|NAME6

1) I have 2 merge both the files.
2) need to do a sort ( key fields are first and second field)
3) remove all the instances of duplicate records from the merged file and write write all these duplicate instances into one file.
4) rest of the records which are unique in the original source files, needs to be written into another file

outfiles:

file3:
A|123|NAME1
A|123|NAME2
C|123|NAME4
C|123|NAME5

File4:

B|123|NAME3
D|123|NAME6

Please help me with the solution as I am in real urgent. Appreciate your help.

Thank you
If I am not wrong your each record in file1 and file2 seems to be unique data .I am pointing out the last character of your files.

SO if all data are unique all the records should go to File4 ..Is n't it?

Explain more clearly, so that you ll get a quick reply from this forum.
Beleive me here in this forum really brilliant and experts here to help you out at any time..

excluding me .. Smilie

Cheers
user_prady
# 3  
Old 12-12-2007
Try something like:
Code:
 sort -t'|' -k 1,1 File1 File2|awk -F\| 'BEGIN{i=0}{
                              pat=$1"|"$2
                              ocurrences[pat]++
                              line[i]=$0
                              i++
       }
       END {
          for (j=0;j<i;j++)
              {
              pat=substr(line[j],1,5)
              if (ocurrences[pat]>1)
                print line[j]>>"File3"
              else
                print line[j]>>"File4"
              }
      }'

# 4  
Old 12-12-2007
Another sort/Awk solution
(if your files are not already sorted as the samples you posted):

Code:
sort -t\| -k1,2 file1 file2|awk '{
	x[$1,$2]++
	y[NR] = $0
} END {
	for (i = 1; i <= NR; i++)
		print y[i] > ((x[substr(y[i],1,5)] > 1) ? "file3" : "file4")
}' SUBSEP="|" FS="|"

Use nawk or /usr/xpg4/bin/awk on Solaris.

P.S. For variable column width: you should not use substr, but split for example:

Code:
sort -t\| -k1,2 file1 file2|awk '{
	x[$1,$2]++
	y[NR] = $0
} END {
	for (i = 1; i <= NR; i++)
		{
			tmp = y[i]
			split(tmp,z)
			print tmp > ((x[z[1],z[2]] > 1) ? "file3" : "file4")
	}
}' SUBSEP="|" FS="|"


Last edited by radoulov; 12-12-2007 at 08:16 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate records

Hi, i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file. INPUT FILE:> CREATE a ( TRIAL_CLIENT NOT NULL VARCHAR2(60), TRIAL_FUND NOT NULL... (3 Replies)
Discussion started by: reignangel2003
3 Replies

2. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

3. Shell Programming and Scripting

Remove somewhat Duplicate records from a flat file

I have a flat file that contains records similar to the following two lines; 1984/11/08 7 700000 123456789 2 1984/11/08 1941/05/19 7 700000 123456789 2 The 123456789 2 represents an account number, this is how I identify the duplicate record. The ### signs represent... (4 Replies)
Discussion started by: jolney
4 Replies

4. UNIX for Dummies Questions & Answers

Using sed command to remove multiple instances of repeating headers in one file?

Hi, I have catenated multiple output files (from a monte carlo run) into one big output file. Each individual file has it's own two line header. So when I catenate, there are multiple two line headers (of the same wording) within the big file. How do I use the sed command to search for the... (1 Reply)
Discussion started by: rebazon
1 Replies

5. Shell Programming and Scripting

Remove Duplicate Records

Hi frinds, Need your help. item , color ,desc ==== ======= ==== 1,red ,abc 1,red , a b c 2,blue,x 3,black,y 4,brown,xv 4,brown,x v 4,brown, x v I have to elemnet the duplicate rows on the basis of item. the final out put will be 1,red ,abc (6 Replies)
Discussion started by: imipsita.rath
6 Replies

6. Shell Programming and Scripting

Remove duplicate records

I want to remove the records based on duplicate. I want to remove if two or more records exists with combination fields. Those records should not come once also file abc.txt ABC;123;XYB;HELLO; ABC;123;HKL;HELLO; CDE;123;LLKJ;HELLO; ABC;123;LSDK;HELLO; CDF;344;SLK;TEST key fields are... (7 Replies)
Discussion started by: svenkatareddy
7 Replies

7. Shell Programming and Scripting

find out duplicate records in file?

Dear All, I have one file which looks like : account1:passwd1 account2:passwd2 account3:passwd3 account1:passwd4 account5:passwd5 account6:passwd6 you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies

8. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

9. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

10. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies
Login or Register to Ask a Question