Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

finding duplicates in columns and removing lines

Shell Programming and Scripting


Tags
shell script, shell scripting, unix scripting, unix scripting basics

Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 04-24-2008
totus totus is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 20 February 2009, 2:11 AM EST
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
Data finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

and end up with this:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

specifically, I'm needing to look for duplicates in column 3 in csv file, if a duplicate is found, remove "lines" based on duplicates found in column 3. In the instance above line two is removed or filtered.

Does anyone know if the unix uniq command can be utilized or perl? uniq doesn't seen to have a delimiter flag to use only character count or bit.

Thanks!
TotusLinux

Last edited by totus; 04-24-2008 at 04:31 PM..
Sponsored Links
    #2  
Old Unix and Linux 04-24-2008
aigles's Unix or Linux Image
aigles aigles is offline Forum Advisor  
Registered User
 
Join Date: Apr 2004
Last Activity: 1 December 2014, 9:03 AM EST
Location: Bordeaux, France
Posts: 1,711
Thanks: 2
Thanked 62 Times in 58 Posts

Code:
awk -F, '! mail[$3]++' inputfile

Jean-Pierre.
Sponsored Links
    #3  
Old Unix and Linux 04-24-2008
totus totus is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 20 February 2009, 2:11 AM EST
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
your kidding me...

how does that work? I'm vaguely familiar with awk.
    #4  
Old Unix and Linux 04-24-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 27 July 2015, 5:13 PM EDT
Location: NM
Posts: 10,502
Thanks: 346
Thanked 871 Times in 809 Posts
awk has associative arrays - the key for the mail array is field #3 ($3).
The first time $3 shows up the value of mail[$3] is zero, mail[$3]++ increments that array element to one. The next time $3 is found to have a value of 1. It does not print.

!mail[$3] only evaluates true when mail[$3] == 0, so when it is 1, 2 ,3 ... it evaluates as false.
Sponsored Links
    #5  
Old Unix and Linux 04-24-2008
in2nix4life's Unix or Linux Image
in2nix4life in2nix4life is offline
Registered User
 
Join Date: Oct 2007
Last Activity: 23 July 2015, 3:18 PM EDT
Location: East Coast
Posts: 620
Thanks: 1
Thanked 177 Times in 163 Posts
With the 'uniq' command:

uniq -1 [inputfile]

Hope this helps.
Sponsored Links
    #6  
Old Unix and Linux 04-24-2008
totus totus is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 20 February 2009, 2:11 AM EST
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by aigles View Post
Code:
awk -F, '! mail[$3]++' inputfile

Jean-Pierre.
Jean-Pierre,

This seemed to work but I noticed that there seem to be a few duplicated left behind. How does the array know what the delimiter? $3 is the field, but not clear on delimiter. Would the same work with tabs for delimiter?

Cheers!Linux
Sponsored Links
    #7  
Old Unix and Linux 04-24-2008
ilan ilan is offline
Registered User
 
Join Date: Jul 2007
Last Activity: 14 June 2014, 4:22 PM EDT
Posts: 110
Thanks: 0
Thanked 2 Times in 2 Posts
Hi Totus,

from aigles solution.... delimitter is ,
so, if you have tabs/spaces...i think you can use it as
awk -F " " '!mail[$4]++' inputfile

(logic is you have to specify the column correctly; i hope you noticed that i am using $4)

-ilan
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
finding duplicates in csv based on key columns baskivs Shell Programming and Scripting 2 11-24-2011 02:28 AM
Removing duplicates from string (not duplicate lines) vickylife Shell Programming and Scripting 8 04-28-2009 08:36 AM
Finding duplicates from positioned substring across lines gapprasath Shell Programming and Scripting 2 12-24-2008 04:43 AM
Help removing lines with duplicated columns yahyaaa Shell Programming and Scripting 14 05-17-2008 07:33 AM
Removing lines that are (same in content) based on columns adsforall UNIX for Dummies Questions & Answers 7 11-09-2007 11:13 AM



All times are GMT -4. The time now is 11:25 PM.