awk script to find duplicate values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk script to find duplicate values
# 1  
Old 07-27-2014
awk script to find duplicate values

The data below consits of items with Class, Sub Class and Property values. I would like to find out same value being captured for different property values for a same Class/Sub Class combination (with in an Item & across items). Like 123 being captured for PAD1, PAD2, PAD4 for ABC-DEF, 456 captured for PXM1, PXM4 and 234 captured for PAD2, PAD1. (Note sometime value could be separated by coma(,) within a cell)


Column Separator = Pipe (|)

Input data
Code:
ID|Class|SubClass|Prop|Value
1|ABC|DEF|PAD1|123|
1|ABC|DEF|PAD2|234|
1|ABC|DEF|PAD3|476|
1|ABC|DEF|PAD4|123|
2|XYZ|MNF|PXM1|456|
2|XYZ|MNF|PXM2|289|
2|XYZ|MNF|PXM3|279|
2|XYZ|MNF|PXM4|488,456|
2|XYZ|MNF|PXM5|284|
3|ABC|DEF|PAD1|234|
3|ABC|DEF|PAD2|777,123|
3|ABC|DEF|PAD3|567|
3|ABC|DEF|PAD4|556|

Output data
Code:
ID|Class|SubClass|Prop|Value|
1|ABC|DEF|PAD1|123|
1|ABC|DEF|PAD4|123|
3|ABC|DEF|PAD2|123|
3|ABC|DEF|PAD1|234|
1|ABC|DEF|PAD2|234|
2|XYZ|MNF|PXM1|456|
2|XYZ|MNF|PXM4|456|

Thanks

Last edited by Scott; 07-27-2014 at 03:28 PM.. Reason: Please use code tags for code and data
# 2  
Old 07-27-2014
Code:
awk -F'|' 'NR==FNR {split($5,a,","); for(i in a) {A[a[i]]++; $5=a[i]; print $0 >> "file.tmp"}} NR!=FNR && (FNR==1 || A[$5]>1)' OFS='|' file file.tmp && rm file.tmp


Last edited by jethrow; 07-27-2014 at 04:10 PM..
# 3  
Old 07-27-2014
Try also
Code:
awk     'NR==1  {print; next}
         function Z (P) {X=$2","$3","P; T[X]=T[X] D[X] sprintf ("%s|%s|%s|%s|%s|", $1, $2, $3, $4, P); C[X]++; D[X]="\n"}
                {Z ($5)}
         NF>6   {Z ($6)}
         END    {for (i in T) if (C[i]>1) print T[i]}
        ' FS="[,|]" file
ID|Class|SubClass|Prop|Value
2|XYZ|MNF|PXM1|456|
2|XYZ|MNF|PXM4|456|
1|ABC|DEF|PAD1|123|
1|ABC|DEF|PAD4|123|
3|ABC|DEF|PAD2|123|
1|ABC|DEF|PAD2|234|
3|ABC|DEF|PAD1|234|

# 4  
Old 08-06-2014
Thanks RudiC, its working as expected. But I am struggling a bit to convert into a script file. Can you please help me.

Thanks
# 5  
Old 08-06-2014
... if you tell me what you are struggling with ... my crystal ball needs some polishing, you know.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C... (5 Replies)
Discussion started by: kaktus
5 Replies

2. Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (12 Replies)
Discussion started by: as7951
12 Replies

3. Shell Programming and Scripting

Sum duplicate values in text file through awk between dates

I need to sum values in text file in case duplicate row are present with same name and different value below is example of data in file i have and format i need. Data in text file 20170308 PM,U,2 PM,U,113 PM,I,123 DA,U,135 DA,I,113 DA,I,1 20170309 PM,U,2 PM,U,1 PM,I,123 PM,I,1... (3 Replies)
Discussion started by: Adfire
3 Replies

4. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

5. Shell Programming and Scripting

How to find the X highest values in a list depending on the values of another list with bash/awk?

Hi everyone, This is an exemple of inpout.txt file (a "," delimited text file which can be open as csv file): ID, Code, Value, Store SP|01, AABBCDE, 15, 3 SP|01, AABBCDE, 14, 2 SP|01, AABBCDF, 13, 2 SP|01, AABBCDE, 16, 3 SP|02, AABBCED, 15, 2 SP|01, AABBCDF, 12, 3 SP|01, AABBCDD,... (1 Reply)
Discussion started by: jeremy589
1 Replies

6. Shell Programming and Scripting

[Solved] Find duplicate and add pattern in sed/awk

<Update> I have the solution: sed 's/\{3\}/&;&;---;4/' The thread can be marked as solved! </Update> Hi There, I'm working on a script processing some data from a website into cvs format. There is only one final problem left I can't find a solution. I've processed my file... (0 Replies)
Discussion started by: lolworlds
0 Replies

7. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

8. Shell Programming and Scripting

Find and replace duplicate column values in a row

I have file which as 12 columns and values like this 1,2,3,4,5 a,b,c,d,e b,c,a,e,f a,b,e,a,h if you see the first column has duplicate values, I need to identify (print it to console) the duplicate value (which is 'a') and also remove duplicate values like below. I could be in two... (5 Replies)
Discussion started by: nuthalapati
5 Replies
Login or Register to Ask a Question