Remove duplicate rows of a file based on a value of a column | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Remove duplicate rows of a file based on a value of a column

UNIX for Dummies Questions & Answers


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 09-26-2008
risk_sly risk_sly is offline
Registered User
 
Join Date: Sep 2008
Last Activity: 17 July 2009, 9:47 PM EDT
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
Remove duplicate rows of a file based on a value of a column

Hi,

I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g.

COL1 COL2 COL3
A 1234 1234
B 3k32 2322
C Xk32 TTT
A NEW XX22
B 3k32 2322


I want the file not to contain duplicate COL1. i.e. the file should only contain the ff:

COL1 COL2 COL3
A 1234 1234
B 3k32 2322
C Xk32 TTT


The records with duplicate COL1 were deleted.

Anybody who has suggestions on how to do this?

Thank you.
Sponsored Links
    #2  
Old 09-26-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 25 October 2014, 4:18 AM EDT
Location: NM
Posts: 10,251
Thanks: 283
Thanked 803 Times in 747 Posts

Code:
awk -F, '!arr[$1]++' oldfile > newfile


Last edited by jim mcnamara; 09-26-2008 at 05:51 AM.. Reason: new FS setting
Sponsored Links
    #3  
Old 09-26-2008
risk_sly risk_sly is offline
Registered User
 
Join Date: Sep 2008
Last Activity: 17 July 2009, 9:47 PM EDT
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks for the reply Jim. But when I tried the script, it returned "event not found error". any idea what's causing this error? also, i forgot to include in my sample, that the file i want to process is comma delimited. thank you.

COL1, COL2, COL3
A, 1234, 1234
B, 3k32, 2322
C, Xk32, TTT
A, NEW, XX22
B, 3k32, 2322
    #4  
Old 09-26-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 25 October 2014, 4:18 AM EDT
Location: NM
Posts: 10,251
Thanks: 283
Thanked 803 Times in 747 Posts
Look at the change above - also try gawk or nawk especailly if you are on a solaris box.
The statement is okay for a modern awk.
Sponsored Links
    #5  
Old 09-26-2008
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 24 October 2014, 5:25 AM EDT
Location: Варна, България / Milano, Italia
Posts: 5,675
Thanks: 184
Thanked 620 Times in 578 Posts
Quote:
Originally Posted by risk_sly View Post
Thanks for the reply Jim. But when I tried the script, it returned "event not found error". any idea what's causing this error?
[...]
It's your shell ((t)csh I suppose).
Try using a script:


Code:
$ cat uniq.awk 
!arr[$1]++
$ awk -f uniq.awk file
COL1, COL2, COL3
A, 1234, 1234
B, 3k32, 2322
C, Xk32, TTT

Sponsored Links
    #6  
Old 09-26-2008
risk_sly risk_sly is offline
Registered User
 
Join Date: Sep 2008
Last Activity: 17 July 2009, 9:47 PM EDT
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks again Jim, but I still get the "arr[: event not found error". I also noticed that when I recall the command (by pressing up arrow key), the part "![arr", is removed from the script. ie. the script becomes

awk -F, '$1]++' oldfile > newfile

im running on a solaris, and have also tried gawk and nawk, but the same error is being returned.

thank you.
Sponsored Links
    #7  
Old 09-26-2008
risk_sly risk_sly is offline
Registered User
 
Join Date: Sep 2008
Last Activity: 17 July 2009, 9:47 PM EDT
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks radolouv. But how do I use this?
$ cat uniq.awk
!arr[$1]++
$ awk -f uniq.awk file


what is uniq.awk file?
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Remove duplicate rows when >10 based on single column value informaticist UNIX for Dummies Questions & Answers 11 01-17-2012 05:46 PM
How to get remove duplicate of a file based on many conditions reva UNIX for Dummies Questions & Answers 3 02-04-2010 06:53 AM
Remove duplicate line detail based on column one data patrick87 Shell Programming and Scripting 10 01-06-2010 08:44 PM
how to delete duplicate rows based on last column reva Shell Programming and Scripting 16 09-01-2009 09:12 AM
To remove date and duplicate rows from a log file using unix commands Pank10 Shell Programming and Scripting 1 08-03-2009 10:11 AM



All times are GMT -4. The time now is 01:28 PM.