|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
I am trying to figure out how to scan a file like so:
1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims office","555-555-5555","tims@mail.com","www.ralph.com and end up with this: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims office","555-555-5555","tims@mail.com","www.ralph.com specifically, I'm needing to look for duplicates in column 3 in csv file, if a duplicate is found, remove "lines" based on duplicates found in column 3. In the instance above line two is removed or filtered. Does anyone know if the unix uniq command can be utilized or perl? uniq doesn't seen to have a delimiter flag to use only character count or bit. Thanks! Totus ![]() Last edited by totus; 04-24-2008 at 04:31 PM.. |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Code:
awk -F, '! mail[$3]++' inputfile Jean-Pierre. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
your kidding me...
how does that work? I'm vaguely familiar with awk.
|
|
#4
|
|||
|
|||
|
awk has associative arrays - the key for the mail array is field #3 ($3).
The first time $3 shows up the value of mail[$3] is zero, mail[$3]++ increments that array element to one. The next time $3 is found to have a value of 1. It does not print. !mail[$3] only evaluates true when mail[$3] == 0, so when it is 1, 2 ,3 ... it evaluates as false. |
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
With the 'uniq' command:
uniq -1 [inputfile] Hope this helps. |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Jean-Pierre,
This seemed to work but I noticed that there seem to be a few duplicated left behind. How does the array know what the delimiter? $3 is the field, but not clear on delimiter. Would the same work with tabs for delimiter? Cheers! ![]() |
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Hi Totus,
from aigles solution.... delimitter is , so, if you have tabs/spaces...i think you can use it as awk -F " " '!mail[$4]++' inputfile (logic is you have to specify the column correctly; i hope you noticed that i am using $4) -ilan |
| Sponsored Links | ||
|
![]() |
| Tags |
| shell script, shell scripting, unix scripting, unix scripting basics |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| finding duplicates in csv based on key columns | baskivs | Shell Programming and Scripting | 2 | 11-24-2011 02:28 AM |
| Removing duplicates from string (not duplicate lines) | vickylife | Shell Programming and Scripting | 8 | 04-28-2009 08:36 AM |
| Finding duplicates from positioned substring across lines | gapprasath | Shell Programming and Scripting | 2 | 12-24-2008 04:43 AM |
| Help removing lines with duplicated columns | yahyaaa | Shell Programming and Scripting | 14 | 05-17-2008 07:33 AM |
| Removing lines that are (same in content) based on columns | adsforall | UNIX for Dummies Questions & Answers | 7 | 11-09-2007 11:13 AM |
|
|