The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
removing duplicates based on key pukars4u Shell Programming and Scripting 1 05-21-2008 03:50 PM
Help removing lines with duplicated columns yahyaaa Shell Programming and Scripting 14 05-17-2008 07:33 AM
Removing lines that are (same in content) based on columns adsforall UNIX for Dummies Questions & Answers 7 11-09-2007 12:13 PM
Removing duplicates giannicello Shell Programming and Scripting 12 09-14-2005 06:12 PM
searching text files on specific columns for duplicates Gerry405 UNIX for Dummies Questions & Answers 2 08-18-2005 10:51 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-24-2008
totus totus is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
Unhappy finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

and end up with this:

1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims office","555-555-5555","tims@mail.com","www.ralph.com

specifically, I'm needing to look for duplicates in column 3 in csv file, if a duplicate is found, remove "lines" based on duplicates found in column 3. In the instance above line two is removed or filtered.

Does anyone know if the unix uniq command can be utilized or perl? uniq doesn't seen to have a delimiter flag to use only character count or bit.

Thanks!
Totus

Last edited by totus; 04-24-2008 at 04:31 PM..
  #2 (permalink)  
Old 04-24-2008
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,416
Code:
awk -F, '! mail[$3]++' inputfile
Jean-Pierre.
  #3 (permalink)  
Old 04-24-2008
totus totus is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
your kidding me...

how does that work? I'm vaguely familiar with awk.
  #4 (permalink)  
Old 04-24-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,717
awk has associative arrays - the key for the mail array is field #3 ($3).
The first time $3 shows up the value of mail[$3] is zero, mail[$3]++ increments that array element to one. The next time $3 is found to have a value of 1. It does not print.

!mail[$3] only evaluates true when mail[$3] == 0, so when it is 1, 2 ,3 ... it evaluates as false.
  #5 (permalink)  
Old 04-24-2008
in2nix4life's Avatar
in2nix4life in2nix4life is offline
Registered User
  
 

Join Date: Oct 2007
Location: East Coast
Posts: 58
With the 'uniq' command:

uniq -1 [inputfile]

Hope this helps.
  #6 (permalink)  
Old 04-24-2008
totus totus is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
Quote:
Originally Posted by aigles View Post
Code:
awk -F, '! mail[$3]++' inputfile
Jean-Pierre.
Jean-Pierre,

This seemed to work but I noticed that there seem to be a few duplicated left behind. How does the array know what the delimiter? $3 is the field, but not clear on delimiter. Would the same work with tabs for delimiter?

Cheers!
  #7 (permalink)  
Old 04-24-2008
ilan ilan is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 101
Hi Totus,

from aigles solution.... delimitter is ,
so, if you have tabs/spaces...i think you can use it as
awk -F " " '!mail[$4]++' inputfile

(logic is you have to specify the column correctly; i hope you noticed that i am using $4)

-ilan
Closed Thread

Bookmarks

Tags
shell script, shell scripting, unix scripting, unix scripting basics

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:14 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0