![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to get 2 records in 2 separate lines in the mail | siri_886 | Shell Programming and Scripting | 4 | 08-24-2009 06:13 AM |
| Need awk script for removing duplicate records | nmumbarkar | Linux | 6 | 04-09-2009 01:05 PM |
| find duplicate records... again | rleal | Shell Programming and Scripting | 4 | 01-28-2009 06:30 PM |
| Records Duplicate | ganesh123 | Shell Programming and Scripting | 9 | 02-22-2007 08:47 AM |
| Removing duplicate files from list with different path | vino | Shell Programming and Scripting | 10 | 05-12-2005 08:44 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
Frans - your bash script processes very fast (significantly faster than the perl scripts), however it is chopping off the first 11 characters of about 50% of the data.
Code:
$ tail November_clean.txt MS CRISTINA VHERNANDEZ 5555 STONEY CREEK LN AUSTELL GA301685247 SONIA THOMPSON 5555 MOUNT PISGAH DOWNS AUSTELL GA301685256 LATASHA MCWHOOTER 5555 CONCEPT 21 CIR AUSTELL GA301686872 SHAMISHA WALLACE 5555 CALGARY GLN AUSTELL GA301687284 MR WILLIS WADE 5555 HIGHWAY 140 RYDAL GA301711302 ---------- Post updated at 05:41 AM ---------- Previous update was at 04:55 AM ---------- Skmdu - Actually, your perl script processed so quickly I can hardly believe it worked. It went through about 250k records in 4 seconds. Maybe I already had the whole list cached in memory from running the other scripts? In any event - thank you. I will review this now. Could I trouble you to add a function to output duplicates to a separate file so I can verify everything? Daptal - thanks as well. Your script ran quite slowly and then basically froze. Skmdu provided a great example of perl's power (provided it worked of course We both can learn from that. |
|
||||
|
With pleasure sitney.....
Slightly modified code.. Code:
#! /usr/bin/perl
## Opening october.txt and store the data in a has with first name and lastname and code as a key
open my $oct, '<', "october.txt" or die $!;
while ( <$oct> ) {
my $name = substr $_,11,32 ;
$name .= substr $_,107,117 ;
$name =~ s/\s+//g;
$octhash{$name}=$_;
}
## Opening november.txt and store the data in a has with first name and lastname and code as a key
open my $nov, '<', "november.txt" or die $!;
while ( <$nov> ) {
$name = substr $_,11,32 ;
$name .= substr $_,148,157 ;
$name =~ s/\s+//g;
$novhash{$name}=$_;
}
open my $dfh, '>', "duplicate.txt";
## Compare the keys and remove the duplicate one in the november hash
foreach ( keys %octhash ) {
if ( exists ( $novhash{$_} ))
{
print $dfh $novhash{$_};
delete $novhash{$_};
}
}
open my $ofh, '>', "november_clean.txt";
print $ofh values(%novhash) ;
|
|
||||
|
Quote:
Code:
awk 'NR==FNR{a[$2$3$NF]++;next}{print > ("November_"((a[$2$3$NF])?"dup":"clean")".txt")}' October.txt November.txt
Last edited by danmero; 4 Days Ago at 06:24 PM.. |
|
||||
|
Code:
perl -ane '$_{@F[1,2,$#F]}++; $_{@F[1,2,$#F]} == 1 && $ARGV eq 'nov' && print' oct nov > nov.clean.txt
![]() |
| Sponsored Links | ||
|
|
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|