Find duplicate based on 'n' fields and mark the duplicate as 'D'

01-28-2012

Registered User

82, 2

Join Date: Dec 2011

Last Activity: 21 November 2016, 12:30 PM EST

Posts: 82

Thanks Given: 33

Thanked 2 Times in 2 Posts

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi,

In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'.

In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create a new field at the end of the records as "C" or "D")

Code:

File 1
====
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2009-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes

Expected Output

File 1
=====
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
 001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2009-11-31|111|2|Jackets|D
 001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes

Last edited by machomaddy; 01-28-2012 at 06:37 AM.. Reason: Edited wrong Input "2010-12-31" to "2011-12-31" in the 4th record

machomaddy

View Public Profile for machomaddy

Find all posts by machomaddy

01-28-2012

Banned

193, 2

Join Date: Dec 2011

Last Activity: 5 March 2013, 1:31 AM EST

Posts: 193

Thanks Given: 22

Thanked 2 Times in 2 Posts

hi
you have provided the wrong input.
The output is not according to the input

parthmittal2007

View Public Profile for parthmittal2007

Find all posts by parthmittal2007

01-28-2012

Registered User

82, 2

Join Date: Dec 2011

Last Activity: 21 November 2016, 12:30 PM EST

Posts: 82

Thanks Given: 33

Thanked 2 Times in 2 Posts

Thanks parth!!

machomaddy

View Public Profile for machomaddy

Find all posts by machomaddy

01-28-2012

Registered User

2,759, 420

Join Date: Jun 2006

Last Activity: 13 September 2015, 8:58 PM EDT

Posts: 2,759

Thanks Given: 44

Thanked 420 Times in 408 Posts

Code:

awk '{s=$1 FS $2 FS $3} 
     NR==FNR{a[s]++;b[s]=FNR;next}
     FNR==1{print;next} 
     {if (a[s]<2)
           {print}
      else
           {print (b[s]==FNR)?$0 "|C":$0 "|D"}}' FS=\| OFS=\| infile infile

These 2 Users Gave Thanks to rdcwayx For This Post:

rdcwayx

View Public Profile for rdcwayx

Find all posts by rdcwayx

01-28-2012

Registered User

82, 2

Join Date: Dec 2011

Last Activity: 21 November 2016, 12:30 PM EST

Posts: 82

Thanks Given: 33

Thanked 2 Times in 2 Posts

Thanks, rdcwayx!!

Could please explain the code? It will be very helpful

machomaddy

View Public Profile for machomaddy

Find all posts by machomaddy

01-28-2012

Banned

193, 2

Join Date: Dec 2011

Last Activity: 5 March 2013, 1:31 AM EST

Posts: 193

Thanks Given: 22

Thanked 2 Times in 2 Posts

hi rdcwayx:

your code does not work in this case:

Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes

---------- Post updated at 11:38 AM ---------- Previous update was at 11:36 AM ----------

According to this data the output should be-:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|C
001|2010-12-31|11|2011-12-31|111|2|Jackets|D
003|2010-11-01|13|2011-12-31|111|1|Shoes

---------- Post updated at 11:41 AM ---------- Previous update was at 11:38 AM ----------

and from your code the output is coming:

Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|D
001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes

This User Gave Thanks to parthmittal2007 For This Post:

parthmittal2007

View Public Profile for parthmittal2007

Find all posts by parthmittal2007

01-28-2012

Registered User

82, 2

Join Date: Dec 2011

Last Activity: 21 November 2016, 12:30 PM EST

Posts: 82

Thanks Given: 33

Thanked 2 Times in 2 Posts

oh!!! am sorry...I missed it...My bad. Yes the code works as what parthmittal says!

machomaddy

View Public Profile for machomaddy

Find all posts by machomaddy

Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Discarding records with duplicate fields

Discussion started by: beca123456

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Discussion started by: sajmar

3. Shell Programming and Scripting

Remove duplicate lines from file based on fields

Discussion started by: Lord Spectre

4. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Discussion started by: OTNA

5. Shell Programming and Scripting

Join fields from files with duplicate lines

Discussion started by: xan.amini

6. Shell Programming and Scripting

Remove duplicate based on Group

Discussion started by: yale_work

7. Shell Programming and Scripting

Filter or remove duplicate block of text without distinguishing marks or fields

Discussion started by: samask

8. Shell Programming and Scripting

awk 2 fields duplicate and 1 different

Discussion started by: numele

9. Shell Programming and Scripting

compare fields in a file with duplicate records

Discussion started by: rleal

10. Shell Programming and Scripting

Extract duplicate fields in rows

Discussion started by: anhtt