Visit Our UNIX and Linux User Community


How can i delete the duplicates based on one column of a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How can i delete the duplicates based on one column of a line
# 1  
Old 08-04-2009
How can i delete the duplicates based on one column of a line

I have my data something like this
Code:
(08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb
(08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa
(08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts
(08/03/2009 22:57:42.425)(:) Ravi vvvvvvvvvvvvvvvvvvsssssssss bsbbbbs
(08/03/2009 22:57:42.426)(:) John bgbhhhhhhhhhhhhhhhhh dddddddddddddd
(08/03/2009 22:57:42.427)(:) king hhhhhhhhhhhhhssssss rr

Here i need to take the 3rd column as the key foir finding the duplicate rows. I need the output to have the rows with only one king,one john and so on...

Output expected :
Code:
(08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb
(08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa
(08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts
(08/03/2009 22:57:42.425)(:) Ravi vvvvvvvvvvvvvvvvvvsssssssss bsbbbbs

can some expert help me with this? this will be very helpful for my script.
# 2  
Old 08-04-2009
May not be efficient
Code:
awk '!arr[$3]++ {print}'  file

# 3  
Old 08-04-2009
I am getting syntax error with that command. Could you verify the syntax please?
# 4  
Old 08-04-2009
Quote:
Originally Posted by rdhanek
I am getting syntax error with that command. Could you verify the syntax please?
Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards
# 5  
Old 08-04-2009
not working

I tried using

nawk '!arr[$3]++ {print}' file

it's not removing the duplicates..just printing all the rows.
# 6  
Old 08-04-2009
Very inefficient:
Code:
awk '{x = $3
if (x != y) print
y = $3
}' file


Last edited by ilikecows; 08-04-2009 at 08:02 AM.. Reason: added code tags
# 7  
Old 08-04-2009
This is printing all the lines without removing the lines with duplicate column3

Previous Thread | Next Thread
Test Your Knowledge in Computers #198
Difficulty: Easy
Bash ranked in the top 20 programming languages according to the TIOBE Index for October 2019.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for duplicates and delete but remain the first one based on a specific pattern

Hi all, I have been trying to delete duplicates based on a certain pattern but failed to make it works. There are more than 1 pattern which are duplicated but i just want to remove 1 pattern only and remain the rest. I cannot use awk '!x++' inputfile.txt or sed '/pattern/d' or use uniq and sort... (7 Replies)
Discussion started by: redse171
7 Replies

2. Shell Programming and Scripting

delete from line and remove duplicates

My Input.....file1 ABCDE4435 Connected to 107.71.136.122 (SubNetwork=ONRM_RootMo_R SubNetwork=XYVLTN29CRBR99 MeContext=ABCDE4435 ManagedElement=1) ABCDE4478 Connected to 166.208.30.57 (SubNetwork=ONRM_RootMo_R SubNetwork=KLFMTN29CR0R04 MeContext=ABCDE4478 ManagedElement=1) ABCDE4478... (5 Replies)
Discussion started by: pareshkp
5 Replies

3. Shell Programming and Scripting

remove duplicates based on single column

Hello, I am new to shell scripting. I have a huge file with multiple columns for example: I have 5 columns below. HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL HWUSI-EAS000_29:1:108 + ... (4 Replies)
Discussion started by: Diya123
4 Replies

4. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

5. Shell Programming and Scripting

Delete lines based on line number

I have a file with ~200K lines, I need to delete 4K lines in it. There is no range. I do have the line numbers of the lines which I want to be deleted. I did tried using > cat del.lines sed '510d;12d;219d;......;3999d' file > source del.lines Word too long. I even tried... (2 Replies)
Discussion started by: novice_man
2 Replies

6. Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

Hi All, i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data p1sc1m1 15517 11325 0 01:00:24 ? 0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@ p1sc1m1 15519 11325 0 01:00:24 ? ... (5 Replies)
Discussion started by: neeraj617
5 Replies

7. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

8. Shell Programming and Scripting

how to delete duplicate rows based on last column

hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is... (16 Replies)
Discussion started by: reva
16 Replies

9. UNIX for Dummies Questions & Answers

delete a line based on first character of the line

Hi, I need to delete all lines in a file which starts with "|" character. Can some one assist me? Thanks (2 Replies)
Discussion started by: borncrazy
2 Replies

Featured Tech Videos