![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| finding duplicates in columns and removing lines | totus | Shell Programming and Scripting | 17 | 5 Days Ago 08:27 AM |
| removing duplicates based on key | pukars4u | Shell Programming and Scripting | 1 | 05-21-2008 12:50 PM |
| removing duplicates from a file | trichyselva | UNIX for Dummies Questions & Answers | 2 | 03-25-2008 07:49 AM |
| removing duplicates and sort -k | orahi001 | UNIX for Dummies Questions & Answers | 3 | 01-25-2008 06:59 AM |
| Removing duplicates [sort , uniq] | sharatz83 | Shell Programming and Scripting | 4 | 07-14-2006 02:12 PM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#8
|
|||
|
|||
|
Wow. Thanks guys. I tried Perderabo's solution and it worked perfectly.
I wasn't sure if a simple code like that would work but it does and I'm a little unsure why it does work...glad it does but not sure why it does. I'll test the other codes as well to see out of curiosity. Thanks. Gianni |
| Forum Sponsor | ||
|
|
|
#9
|
|||
|
|||
|
Quote:
I've seen this technique before but thought I would test it on 1 million lines in a data file. If finished in half of the time than the sort -mu command. awk also eliminated duplicates 1 million lines apart as you would expect based on the logic. The sort2-mu command assumes that the file is already sorted and a duplicate 1 million lines apart is ignored. |
|
#10
|
|||
|
|||
|
I tried the different solutions and the one that comes closest is Perderabo's.
The only time it doesn't work is if there are any blanks in the first set of alphanumerics ( which I just found out is possible). How would I modify any of the above solutions to look at, say, characters 1 thru 30, out of a 100 character record for exact matches and keep first occurrence and remove the rest of the duplicates? Here's some records that I found that's causing me to be back at square one... 92247140 1203QA RRN .. 92247140 1203QA RRP ... 92247140 1203QB RRP ... Do I have to do an awk on this one with substrings? I tested Jim's solution also and it was fast..unfortunately it found a little more dups than I'd hope due to the way the records come in, otherwise, it I'd use it. Thanks, Gianni |
|
#11
|
|||
|
|||
|
Jim's awk solution will work using substring:
Code:
awk '!a[substr($0,1,15)]++' inputfile My test result: Code:
92247140 1203QA RRN .. 92247140 1203QB RRP ... |
|
#12
|
||||
|
||||
|
it might be better to think of your lines in terms of 'fields' - In case your 'fields' might become varying in length.
Right now all your fields are of the same length and 'substr($0,1,15)' seems to be refering to the first two fields. This is what makes your line/record unique. If that's the case: Code:
awk '!a[$1,$2]++' inputfile |
|
#13
|
|||
|
|||
|
Quote:
Code:
47147140631204DC ADK 47147140631204DC ALK Quote:
|
|||
| Google The UNIX and Linux Forums |