The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
removing duplicates based on key pukars4u Shell Programming and Scripting 1 05-21-2008 12:50 PM
Help removing lines with duplicated columns yahyaaa Shell Programming and Scripting 14 05-17-2008 04:33 AM
Removing lines that are (same in content) based on columns adsforall UNIX for Dummies Questions & Answers 7 11-09-2007 09:13 AM
Removing duplicates giannicello Shell Programming and Scripting 12 09-14-2005 03:12 PM
searching text files on specific columns for duplicates Gerry405 UNIX for Dummies Questions & Answers 2 08-18-2005 07:51 AM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1  
Old 04-24-2008
Registered User
 

Join Date: Apr 2008
Posts: 5
Unhappy finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

and end up with this:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

specifically, I'm needing to look for duplicates in column 3 in csv file, if a duplicate is found, remove "lines" based on duplicates found in column 3. In the instance above line two is removed or filtered.

Does anyone know if the unix uniq command can be utilized or perl? uniq doesn't seen to have a delimiter flag to use only character count or bit.

Thanks!
Totus

Last edited by totus; 04-24-2008 at 01:31 PM.
Reply With Quote
Forum Sponsor
  #2  
Old 04-24-2008
aigles's Avatar
Registered User
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,212
Code:
awk -F, '! mail[$3]++' inputfile
Jean-Pierre.
Reply With Quote
  #3  
Old 04-24-2008
Registered User
 

Join Date: Apr 2008
Posts: 5
your kidding me...

how does that work? I'm vaguely familiar with awk.
Reply With Quote
  #4  
Old 04-24-2008
...@...
 

Join Date: Feb 2004
Location: NM
Posts: 4,264
awk has associative arrays - the key for the mail array is field #3 ($3).
The first time $3 shows up the value of mail[$3] is zero, mail[$3]++ increments that array element to one. The next time $3 is found to have a value of 1. It does not print.

!mail[$3] only evaluates true when mail[$3] == 0, so when it is 1, 2 ,3 ... it evaluates as false.
Reply With Quote
  #5  
Old 04-24-2008
in2nix4life's Avatar
Registered User
 

Join Date: Oct 2007
Location: East Coast
Posts: 46
With the 'uniq' command:

uniq -1 [inputfile]

Hope this helps.
Reply With Quote
  #6  
Old 04-24-2008
Registered User
 

Join Date: Apr 2008
Posts: 5
Quote:
Originally Posted by aigles View Post
Code:
awk -F, '! mail[$3]++' inputfile
Jean-Pierre.
Jean-Pierre,

This seemed to work but I noticed that there seem to be a few duplicated left behind. How does the array know what the delimiter? $3 is the field, but not clear on delimiter. Would the same work with tabs for delimiter?

Cheers!
Reply With Quote
  #7  
Old 04-24-2008
Registered User
 

Join Date: Jul 2007
Posts: 76
Hi Totus,

from aigles solution.... delimitter is ,
so, if you have tabs/spaces...i think you can use it as
awk -F " " '!mail[$4]++' inputfile

(logic is you have to specify the column correctly; i hope you noticed that i am using $4)

-ilan
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 05:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0