Visit Our UNIX and Linux User Community


Find duplicate based on 'n' fields and mark the duplicate as 'D'


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find duplicate based on 'n' fields and mark the duplicate as 'D'
# 1  
Old 01-28-2012
Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi,

In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'.

In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create a new field at the end of the records as "C" or "D")

Code:
File 1
====
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2009-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes

Expected Output

File 1
=====
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
 001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2009-11-31|111|2|Jackets|D
 001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes


Last edited by machomaddy; 01-28-2012 at 07:37 AM.. Reason: Edited wrong Input "2010-12-31" to "2011-12-31" in the 4th record
# 2  
Old 01-28-2012
hi
you have provided the wrong input.
The output is not according to the input
# 3  
Old 01-28-2012
Thanks parth!!
# 4  
Old 01-28-2012
Code:
awk '{s=$1 FS $2 FS $3} 
     NR==FNR{a[s]++;b[s]=FNR;next}
     FNR==1{print;next} 
     {if (a[s]<2)
           {print}
      else
           {print (b[s]==FNR)?$0 "|C":$0 "|D"}}' FS=\| OFS=\| infile infile

These 2 Users Gave Thanks to rdcwayx For This Post:
# 5  
Old 01-28-2012
Thanks, rdcwayx!! Smilie Could please explain the code? It will be very helpful
# 6  
Old 01-28-2012
hi rdcwayx:

your code does not work in this case:

Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes

---------- Post updated at 11:38 AM ---------- Previous update was at 11:36 AM ----------

According to this data the output should be-:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|C
001|2010-12-31|11|2011-12-31|111|2|Jackets|D
003|2010-11-01|13|2011-12-31|111|1|Shoes

---------- Post updated at 11:41 AM ---------- Previous update was at 11:38 AM ----------

and from your code the output is coming:

Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|D
001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes
This User Gave Thanks to parthmittal2007 For This Post:
# 7  
Old 01-28-2012
oh!!! am sorry...I missed it...My bad. Yes the code works as what parthmittal says!

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Discarding records with duplicate fields

Hi, My input looks like this (tab-delimited): grp1 name2 firstname M 55 item1 item1.0 grp1 name2 firstname F 55 item1 item1.0 grp2 name1 firstname M 55 item1 item1.0 grp2 name2 firstname M 55 item1 item1.0 Using awk, I am trying to discard the records with common fields 2, 4, 5, 6, 7... (4 Replies)
Discussion started by: beca123456
4 Replies

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

3. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Dear community, I have to remove duplicate lines from a file contains a very big ammount of rows (milions?) based on 1st and 3rd columns The data are like this: Region 23/11/2014 09:11:36 41752 Medio 23/11/2014 03:11:38 4132 Info 23/11/2014 05:11:09 4323... (2 Replies)
Discussion started by: Lord Spectre
2 Replies

4. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Hi , Some time i got duplicated value in my files , bundle_identifier= B Sometext=ABC bundle_identifier= A bundle_unit=500 Sometext123=ABCD bundle_unit=400 i need to check if there is a duplicated values or not if yes , i need to check if the value is A or B when Bundle_Identified ,... (2 Replies)
Discussion started by: OTNA
2 Replies

5. Shell Programming and Scripting

Join fields from files with duplicate lines

I have two files, file1.txt: 1 abc 2 def 2 dgh 3 ijk 4 lmn file2.txt 1 opq 2 rst 3 uvw My desired output is: 1 abc opq 2 def rst 2 dgh rst 3 ijk uvw (2 Replies)
Discussion started by: xan.amini
2 Replies

6. Shell Programming and Scripting

Remove duplicate based on Group

Hi, How can I remove duplicates from a file based on group on other column? for example: Test1|Test2|Test3|Test4|Test5 Test1|Test6|Test7|Test8|Test5 Test1|Test9|Test10|Test11|Test12 Test1|Test13|Test14|Test15|Test16 Test17|Test18|Test19|Test20|Test21 Test17|Test22|Test23|Test24|Test5 ... (2 Replies)
Discussion started by: yale_work
2 Replies

7. Shell Programming and Scripting

Filter or remove duplicate block of text without distinguishing marks or fields

Hello, Although I have found similar questions, I could not find advice that could help with our problem. The issue: We have several hundreds text files containing repeated blocks of text (I guess back at the time they were prepared like that to optmize printing). The block of texts... (13 Replies)
Discussion started by: samask
13 Replies

8. Shell Programming and Scripting

awk 2 fields duplicate and 1 different

I have file that I need to remove the duplicates. The problem is, I need to only keep the one which has a unique 3rd field. Here is a sample file: xxx.xxx:x:CISCO1.CLEVE61W:ERIE.NET:x:x:x:x: xxx.xxx:x:CISCO2.CLEVE62W:OHIO.NET:x:x:x:x: xxx.xxx:x:CISCO2.CLEVE62W:NORTH.NET:x:x:x:x:... (1 Reply)
Discussion started by: numele
1 Replies

9. Shell Programming and Scripting

compare fields in a file with duplicate records

Hi: I've been searching the net but didnt find a clue. I have a file in which, for some records, some fields coincide. I want to compare one (or more) of the dissimilar fields and retain the one record that fulfills a certain condition. For example, on this file: 99 TR 1991 5 06 ... (1 Reply)
Discussion started by: rleal
1 Replies

10. Shell Programming and Scripting

Extract duplicate fields in rows

I have a input file with formating: 6000000901 ;36200103 ;h3a01f496 ; 2000123605 ;36218982 ;heefa1328 ; 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; Each fields is seperated by semi-comma. Sometime, the second files is... (6 Replies)
Discussion started by: anhtt
6 Replies

Featured Tech Videos