Delete Duplicates on the basis of two column values.

01-06-2011

Registered User

22, 0

Join Date: Jan 2011

Last Activity: 14 May 2013, 2:14 PM EDT

Posts: 22

Thanks Given: 2

Thanked 0 Times in 0 Posts

Delete Duplicates on the basis of two column values.

Hi All,
i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data

Code:

p1sc1m1 15517 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@
p1sc1m1 15519 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2968 in3v mvmp02 0 8000 N S 970 750@751@752@
p1sc1m1 15522 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2969 in3v mvmp01 0 8000 N S 971 750@751@752@
p1sc1m1 15544 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2949 innv mvmp02 0 8000 N S 977 750@751@752@
p1sc1m1 15546 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2956 innv mvmp03 0 8000 N S 978 750@751@752@
p1sc1m1 17445 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2950 zin5 mvmp02 0 8000 N S 1384 750@751@752
p1sc1m1 17451 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2957 zin5 mvmp03 0 8000 N S 1385 750@751@752
p1sc1m1 17475 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2952 zt4v mvmp02 0 8000 N S 1391 750@751@752
p1sc1m1 17478 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2959 zt4v mvmp03 0 8000 N S 1392 750@751@752
p1sc1m1 17481 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2970 zt5v mvmp01 0 8000 N S 1393 750@751@752
p1sc1m1 17487 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2960 zt6v mvmp01 0 8000 N S 1395 750@751@752
p1sc1m1 17489 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2962 zt6v mvmp03 0 8000 N S 1396 750@751@752

The first and the third row should be deleted as the two column values are same(they are highlited in red).

Thanks for the help in advance.
Neeraj Vashishty

Last edited by Franklin52; 01-07-2011 at 06:18 AM.. Reason: please use code tags

neeraj617

View Public Profile for neeraj617

Find all posts by neeraj617

01-06-2011

Registered User

413, 99

Join Date: Nov 2010

Last Activity: 12 July 2012, 8:07 AM EDT

Location: Hyderabad, India

Posts: 413

Thanks Given: 13

Thanked 99 Times in 96 Posts

Assuming that ONLY 10th and 11th columns are considered to decide duplicates (No other column check)

Code:

awk 'NR==FNR{a[$10"_"$11]++;next;}{if(a[$10"_"$11] < 2) print $0}' inputFile inputFile

This User Gave Thanks to anurag.singh For This Post:

anurag.singh

View Public Profile for anurag.singh

Find all posts by anurag.singh

01-06-2011

Registered User

22, 0

Join Date: Jan 2011

Last Activity: 14 May 2013, 2:14 PM EDT

Posts: 22

Thanks Given: 2

Thanked 0 Times in 0 Posts

Thanks Arung it is working , one more thing just i case i want to display these duplicate value and not to delete them , can you give me the command for the same ?

neeraj617

View Public Profile for neeraj617

Find all posts by neeraj617

01-06-2011

Registered User

894, 183

Join Date: Jul 2010

Last Activity: 2 November 2018, 11:07 AM EDT

Location: IN

Posts: 894

Thanks Given: 15

Thanked 183 Times in 174 Posts

Code:

awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]!=$0)' inputfile inputfile #prints out only duplicate

Code:

awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]==$0)'  inputfile inputfile # prints out distinct lines i.e removes duplicate

Code:

awk 'NR==FNR{a[$10$11]=$0;next}{print a[$10$11]==$0?$0:$0"--dup"}' inputfile inputfile # prints out all lines with duplicate line appended with dup

michaelrozar17

View Public Profile for michaelrozar17

Find all posts by michaelrozar17

01-06-2011

Registered User

413, 99

Join Date: Nov 2010

Last Activity: 12 July 2012, 8:07 AM EDT

Location: Hyderabad, India

Posts: 413

Thanks Given: 13

Thanked 99 Times in 96 Posts

If you want ONLY duplicates.

Code:

awk '{if(a[$10"_"$11]) print $0;a[$10"_"$11]=1}' inputFile

Last edited by anurag.singh; 01-06-2011 at 09:09 AM..

anurag.singh

View Public Profile for anurag.singh

Find all posts by anurag.singh

01-07-2011

Registered User

22, 0

Join Date: Jan 2011

Last Activity: 14 May 2013, 2:14 PM EDT

Posts: 22

Thanks Given: 2

Thanked 0 Times in 0 Posts

Thanks Anurag and Michael , its worked fin and i had a successfull deployment today

thanks for your help.

neeraj617

View Public Profile for neeraj617

Find all posts by neeraj617

Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Group/concatenate certain column and basis on this do addition on other column

Discussion started by: as7951

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

3. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Discussion started by: sajmar

4. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Discussion started by: newbie_01

5. Shell Programming and Scripting

print least value of a column on the basis of another column

Discussion started by: CAch

6. Shell Programming and Scripting

print least value of a column on the basis of another column

Discussion started by: CAch

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

8. Shell Programming and Scripting

split on the basis of 2nd and 3rd column

Discussion started by: cdfd123

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

Discussion started by: rdhanek

10. Shell Programming and Scripting

Count first column on the basis of two other columns

Discussion started by: kaustubh137