The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to get 2 records in 2 separate lines in the mail siri_886 Shell Programming and Scripting 4 08-24-2009 06:13 AM
Need awk script for removing duplicate records nmumbarkar Linux 6 04-09-2009 01:05 PM
find duplicate records... again rleal Shell Programming and Scripting 4 01-28-2009 06:30 PM
Records Duplicate ganesh123 Shell Programming and Scripting 9 02-22-2007 08:47 AM
Removing duplicate files from list with different path vino Shell Programming and Scripting 10 05-12-2005 08:44 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #8 (permalink)  
Old 4 Days Ago
frans's Avatar
frans frans is offline
Registered User
  
 

Join Date: Oct 2009
Location: Drôme, France
Posts: 85
bash

Code:
#!/bin/bash
while read LINE2
do
    LINE1=$(grep "${LINE2:12:22}" october.txt) && [ "${LINE2:148:9}" = "${LINE1:107:9}" ] && continue
    echo "$LINE2"
done < november.txt > november_clean.txt
  #9 (permalink)  
Old 4 Days Ago
sitney sitney is offline
Registered User
  
 

Join Date: Feb 2008
Posts: 19
Frans - your bash script processes very fast (significantly faster than the perl scripts), however it is chopping off the first 11 characters of about 50% of the data.

Code:
$ tail November_clean.txt
MS         CRISTINA      VHERNANDEZ         5555 STONEY CREEK LN                                                      AUSTELL                     GA301685247
SONIA          THOMPSON          5555 MOUNT PISGAH DOWNS                                                   AUSTELL                     GA301685256
LATASHA        MCWHOOTER         5555 CONCEPT 21 CIR                                                       AUSTELL                     GA301686872
SHAMISHA       WALLACE           5555 CALGARY GLN                                                           AUSTELL                     GA301687284
MR         WILLIS         WADE              5555 HIGHWAY 140                                                          RYDAL                       GA301711302


---------- Post updated at 05:41 AM ---------- Previous update was at 04:55 AM ----------

Skmdu - Actually, your perl script processed so quickly I can hardly believe it worked. It went through about 250k records in 4 seconds. Maybe I already had the whole list cached in memory from running the other scripts? In any event - thank you. I will review this now.
Could I trouble you to add a function to output duplicates to a separate file so I can verify everything?

Daptal - thanks as well. Your script ran quite slowly and then basically froze. Skmdu provided a great example of perl's power (provided it worked of course We both can learn from that.
  #10 (permalink)  
Old 4 Days Ago
skmdu skmdu is offline
Registered User
  
 

Join Date: Jul 2009
Posts: 102
With pleasure sitney.....

Slightly modified code..
Code:
#! /usr/bin/perl

## Opening october.txt and store the data in a has with first name and lastname and code as a key
open my $oct, '<', "october.txt" or die $!;
while ( <$oct> ) {
        my $name = substr $_,11,32 ;
        $name .= substr $_,107,117 ;
        $name =~ s/\s+//g;
        $octhash{$name}=$_;
}
## Opening november.txt and store the data in a has with first name and lastname and code as a key
open my $nov, '<', "november.txt" or die $!;

while  ( <$nov> ) {
        $name = substr $_,11,32 ;
        $name .= substr $_,148,157 ;
        $name =~ s/\s+//g;
        $novhash{$name}=$_;
}

open my $dfh, '>', "duplicate.txt";

## Compare the keys and remove the duplicate one in the november hash
foreach ( keys %octhash ) {
 if ( exists ( $novhash{$_} ))
 {
        print $dfh $novhash{$_};
        delete $novhash{$_};
 }
}

open my $ofh, '>', "november_clean.txt";
print  $ofh values(%novhash) ;
  #11 (permalink)  
Old 4 Days Ago
frans's Avatar
frans frans is offline
Registered User
  
 

Join Date: Oct 2009
Location: Drôme, France
Posts: 85
Quote:
Originally Posted by sitney View Post
Frans - your bash script processes very fast (significantly faster than the perl scripts), however it is chopping off the first 11 characters of about 50% of the data.
I believe that's coming from lines where the beginning is blank, try this litle modification
Code:
#!/bin/bash
while read LINE2
do
    LINE1="$(grep "${LINE2:12:22}" october.txt)" && [ "${LINE2:148:9}" = "${LINE1:107:9}" ] && continue
    echo "$LINE2"
done < november.txt > november_clean.txt
  #12 (permalink)  
Old 4 Days Ago
deindorfer deindorfer is offline
Registered User
  
 

Join Date: Aug 2009
Posts: 7
Code:

perl -ne '/.+?\s+(\w+)\s+(\w+).+PR(\d{9})/; $k = $1.$2.$3; $seen{$k}++; print if $seen{$k} == 1 && $ARGV eq "nov"' oct nov > nov.clean.txt
No Fixed Length Fields. Command-line only
  #13 (permalink)  
Old 4 Days Ago
danmero danmero is offline Forum Advisor  
  
 

Join Date: Nov 2007
Location: 45.48-73.63
Posts: 1,419
Quote:
Originally Posted by sitney View Post
I want to make sure that there are no records from October.txt in November.txt.
Try AWK
Code:
awk 'NR==FNR{a[$2$3$NF]++;next}{print > ("November_"((a[$2$3$NF])?"dup":"clean")".txt")}' October.txt November.txt
Use gawk, nawk or /usr/xpg4/bin/awk on Solaris.

Last edited by danmero; 4 Days Ago at 06:24 PM..
  #14 (permalink)  
Old 4 Days Ago
deindorfer deindorfer is offline
Registered User
  
 

Join Date: Aug 2009
Posts: 7
Code:

perl -ane '$_{@F[1,2,$#F]}++; $_{@F[1,2,$#F]} == 1 && $ARGV eq 'nov' && print' oct nov > nov.clean.txt
Begining to see the point about awk...
Sponsored Links
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 02:09 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0