Home Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

finding duplicates in columns and removing lines

Tags
columns, duplicates, lines, shell script, shell scripting, shell scripts, unix scripting, unix scripting basics

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 04-24-2008
Data finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

and end up with this:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

specifically, I'm needing to look for duplicates in column 3 in csv file, if a duplicate is found, remove "lines" based on duplicates found in column 3. In the instance above line two is removed or filtered.

Does anyone know if the unix uniq command can be utilized or perl? uniq doesn't seen to have a delimiter flag to use only character count or bit.

Thanks!
Totus

Last edited by totus; 04-24-2008 at 04:31 PM..
# 2  
Old 04-24-2008
Code:
awk -F, '! mail[$3]++' inputfile

Jean-Pierre.
# 3  
Old 04-24-2008
your kidding me...

how does that work? I'm vaguely familiar with awk.
# 4  
Old 04-24-2008
awk has associative arrays - the key for the mail array is field #3 ($3).
The first time $3 shows up the value of mail[$3] is zero, mail[$3]++ increments that array element to one. The next time $3 is found to have a value of 1. It does not print.

!mail[$3] only evaluates true when mail[$3] == 0, so when it is 1, 2 ,3 ... it evaluates as false.
# 5  
Old 04-24-2008
With the 'uniq' command:

uniq -1 [inputfile]

Hope this helps.
# 6  
Old 04-24-2008
Quote:
Originally Posted by aigles
Code:
awk -F, '! mail[$3]++' inputfile

Jean-Pierre.
Jean-Pierre,

This seemed to work but I noticed that there seem to be a few duplicated left behind. How does the array know what the delimiter? $3 is the field, but not clear on delimiter. Would the same work with tabs for delimiter?

Cheers!
# 7  
Old 04-24-2008
Hi Totus,

from aigles solution.... delimitter is ,
so, if you have tabs/spaces...i think you can use it as
awk -F " " '!mail[$4]++' inputfile

(logic is you have to specify the column correctly; i hope you noticed that i am using $4)

-ilan
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Removing carriage returns from multiple lines in multiple files of different number of columns dJHa Shell Programming and Scripting 7 06-08-2016 04:48 PM
Removing duplicates from delimited file based on 2 columns kevinprood Shell Programming and Scripting 2 08-13-2014 04:37 AM
UNIX scripting for finding duplicates and null records in pk columns praveenraj.1991 Shell Programming and Scripting 5 05-11-2014 04:20 AM
Removing duplicates in fixed width file which has multiple key columns saj Shell Programming and Scripting 5 12-17-2012 12:06 AM
finding duplicates in csv based on key columns baskivs Shell Programming and Scripting 2 11-24-2011 02:28 AM
Removing duplicates imdadulla Shell Programming and Scripting 6 10-12-2010 08:25 AM
Removing duplicates from string (not duplicate lines) vickylife Shell Programming and Scripting 8 04-28-2009 08:36 AM
Finding duplicates from positioned substring across lines gapprasath Shell Programming and Scripting 2 12-24-2008 04:43 AM
Help removing lines with duplicated columns yahyaaa Shell Programming and Scripting 14 05-17-2008 07:33 AM
Removing lines that are (same in content) based on columns adsforall UNIX for Dummies Questions & Answers 7 11-09-2007 11:13 AM


All times are GMT -4. The time now is 10:51 AM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password