Delete Duplicates on the basis of two column values.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete Duplicates on the basis of two column values.
# 1  
Old 01-06-2011
Delete Duplicates on the basis of two column values.

Hi All,
i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data
Code:
p1sc1m1 15517 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@
p1sc1m1 15519 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2968 in3v mvmp02 0 8000 N S 970 750@751@752@
p1sc1m1 15522 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2969 in3v mvmp01 0 8000 N S 971 750@751@752@
p1sc1m1 15544 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2949 innv mvmp02 0 8000 N S 977 750@751@752@
p1sc1m1 15546 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2956 innv mvmp03 0 8000 N S 978 750@751@752@
p1sc1m1 17445 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2950 zin5 mvmp02 0 8000 N S 1384 750@751@752
p1sc1m1 17451 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2957 zin5 mvmp03 0 8000 N S 1385 750@751@752
p1sc1m1 17475 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2952 zt4v mvmp02 0 8000 N S 1391 750@751@752
p1sc1m1 17478 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2959 zt4v mvmp03 0 8000 N S 1392 750@751@752
p1sc1m1 17481 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2970 zt5v mvmp01 0 8000 N S 1393 750@751@752
p1sc1m1 17487 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2960 zt6v mvmp01 0 8000 N S 1395 750@751@752
p1sc1m1 17489 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2962 zt6v mvmp03 0 8000 N S 1396 750@751@752


The first and the third row should be deleted as the two column values are same(they are highlited in red).

Thanks for the help in advance.
Neeraj Vashishty

Last edited by Franklin52; 01-07-2011 at 06:18 AM.. Reason: please use code tags
# 2  
Old 01-06-2011
Assuming that ONLY 10th and 11th columns are considered to decide duplicates (No other column check)
Code:
awk 'NR==FNR{a[$10"_"$11]++;next;}{if(a[$10"_"$11] < 2) print $0}' inputFile inputFile

This User Gave Thanks to anurag.singh For This Post:
# 3  
Old 01-06-2011
Thanks Arung it is working , one more thing just i case i want to display these duplicate value and not to delete them , can you give me the command for the same ?
# 4  
Old 01-06-2011
Code:
awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]!=$0)' inputfile inputfile #prints out only duplicate

Code:
awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]==$0)'  inputfile inputfile # prints out distinct lines i.e removes duplicate

Code:
awk 'NR==FNR{a[$10$11]=$0;next}{print a[$10$11]==$0?$0:$0"--dup"}' inputfile inputfile # prints out all lines with duplicate line appended with dup

# 5  
Old 01-06-2011
If you want ONLY duplicates.
Code:
awk '{if(a[$10"_"$11]) print $0;a[$10"_"$11]=1}' inputFile


Last edited by anurag.singh; 01-06-2011 at 09:09 AM..
# 6  
Old 01-07-2011
Thanks Anurag and Michael , its worked fin and i had a successfull deployment today Smilie thanks for your help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Group/concatenate certain column and basis on this do addition on other column

Hi Experts, Need your support I want to group/concatenate column 1,2,12 and 13 and if found duplicate then need to sum value of column 17,20,21 and column22. After concatenation if found unique then no action to be taken. Secondly want to make duplicate rows basis on grouping/concatenation of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

3. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

4. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Hi, I have the following output from an Oracle SQL statement and I want to remove duplicated column values. I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those. So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies

5. Shell Programming and Scripting

print least value of a column on the basis of another column

Hi, I am new to linux... I have a file which looks like: I want to print the entire row in which 5th column is having minimum value for every first column (i.e min for 9 and min for 16). Along with the condition awk -F" " 'b < $5 {b=$5; a=$0} END {for (i in a) {print a}}' inputfile >... (5 Replies)
Discussion started by: CAch
5 Replies

6. Shell Programming and Scripting

print least value of a column on the basis of another column

Hi, I am new to linux... I have a file which looks like: I want to print the entire row in which 5th column is having minimum value for every first column (i.e min for 9 and min for 16). Along with the condition awk -F" " 'b < $5 {b=$5; a=$0} END {for (i in a) {print a}}' inputfile >... (1 Reply)
Discussion started by: CAch
1 Replies

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

8. Shell Programming and Scripting

split on the basis of 2nd and 3rd column

file A aa 22 48 ab 22 48 tcf 50 76 gf 50 76 h 89 100 yh 89 100 how can we split the file on the basis of common 2 and third column output like file A-1 aa 22 48 ab 22 48 file A-2 cf 50 76 gf 50 76 (3 Replies)
Discussion started by: cdfd123
3 Replies

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

10. Shell Programming and Scripting

Count first column on the basis of two other columns

Hello, I have a file ================= 12 SRV1 GRP1 19 SRV1 GRP1 19 SRV1 GRP2 3 SRV1 GRP1 3 SRV1 GRP2 30 SRV1 GRP2 7 SRV1 GRP1 8 SRV1 GRP3 =========== I want output like =============== 41 SRV1 GRP1 52 SRV1 GRP2 8 SRV1 GRP3 (1 Reply)
Discussion started by: kaustubh137
1 Replies
Login or Register to Ask a Question