The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
change order of fields in header record JohnMario UNIX for Dummies Questions & Answers 1 05-22-2008 02:58 PM
awk script to update header record klut Shell Programming and Scripting 5 04-16-2008 09:04 AM
Parsing records from one record bwrynz1 UNIX for Advanced & Expert Users 1 03-10-2008 10:54 AM
Count No of Records in File without counting Header and Trailer Records guiguy Shell Programming and Scripting 2 06-07-2007 12:15 PM
Records Duplicate ganesh123 Shell Programming and Scripting 9 02-22-2007 08:47 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 01-12-2007
run_eim run_eim is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 5
How to extract duplicate records with associated header record

All,

I have a task to search through several hundred files and extract duplicate detail records and keep them grouped with their header record. If no duplicate detail record exists, don't pull the header. For example, an input file could look like this:

input.txt
HA
D1
D2
D2
D3
D4
D4
HB
D1
D2
HC
D1
D1
D2
D3
D3

The output would be:

output.txt
HA
D2
D4
HC
D1
D3

Would it be possible to do this with AWK? I do not know python.

Thank you for your time.
  #2 (permalink)  
Old 01-15-2007
c2b2 c2b2 is offline
Registered User
  
 

Join Date: Dec 2006
Posts: 29
What distinguishes Header and data? Is there a fixed list of Headers or was the input file generated after pasting your several hundred files? Can you explain the exact requirements?
  #3 (permalink)  
Old 01-15-2007
zazzybob's Avatar
zazzybob zazzybob is offline Forum Advisor  
Registered Geek
  
 

Join Date: Dec 2003
Location: Melbourne, Australia
Posts: 2,100
I'd use perl for this...
Code:
$ ./input.pl 
HA
D2
D4
HC
D1
D3
$ cat ./input.pl 
#!/usr/bin/perl
# Script to print headers and duplicate items from input.txt

use warnings;
use strict;
my @records;
undef $/;

open ( INPUT, "< input.txt" ) || die "Couldn't open input file: $!\n";
# use a look-ahead assertion here
@records = split( /^(?=(?:H))/m, <INPUT> );
foreach my $record ( @records ) {
   my @lines = split( /\n/, $record );
   my $header = $lines[0];
   my %linehash;
   my $headerdone = 0;
   foreach my $line ( @lines ) {
      $linehash{$line}++;  
   }
   foreach my $key ( sort ( keys ( %linehash ) ) ) {
      my $value = $linehash{$key};
      if ( $value > 1 ) { 
         if ( $headerdone == 0 ) {
            printf( "%s\n", $header );
            $headerdone++;
         }
         printf( "%s\n", $key );
      }
   }
}
close ( INPUT );

exit ( 0 );
Cheers
ZB
  #4 (permalink)  
Old 01-15-2007
run_eim run_eim is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 5
Quote:
Originally Posted by zazzybob
I'd use perl for this...
Code:
$ ./input.pl 
HA
D2
D4
HC
D1
D3
$ cat ./input.pl 
#!/usr/bin/perl
# Script to print headers and duplicate items from input.txt

use warnings;
use strict;
my @records;
undef $/;

open ( INPUT, "< input.txt" ) || die "Couldn't open input file: $!\n";
# use a look-ahead assertion here
@records = split( /^(?=(?:H))/m, <INPUT> );
foreach my $record ( @records ) {
   my @lines = split( /\n/, $record );
   my $header = $lines[0];
   my %linehash;
   my $headerdone = 0;
   foreach my $line ( @lines ) {
      $linehash{$line}++;  
   }
   foreach my $key ( sort ( keys ( %linehash ) ) ) {
      my $value = $linehash{$key};
      if ( $value > 1 ) { 
         if ( $headerdone == 0 ) {
            printf( "%s\n", $header );
            $headerdone++;
         }
         printf( "%s\n", $key );
      }
   }
}
close ( INPUT );

exit ( 0 );
Cheers
ZB
Thanks for the replies.
These is actually multiple files of daily extracts of expense report data from a transactional system. each file is made up of individual expense reports (header records) and the expense line items for each report (detail records). We had a situation where some detail records, but not all, were duplicated. This occurred in some output files, but not all. My requirements are to identify, by export file, the duplicate records, attached to their respective header records. We need this information to send to the system of record to correct these errors. It (hopefully) will be a one time fix. Also, I do not know perl, but am willing to learn enough to use it as a solution.

thanks again for posting a reply.
  #5 (permalink)  
Old 01-16-2007
matrixmadhan matrixmadhan is offline Forum Advisor  
Technorati Master
  
 

Join Date: Mar 2005
Location: leaf node in B+ tree
Posts: 2,952
Code:
#! /opt/third-party/bin/perl

my ($content, $i, $header, $headerprint, %fileHash);
open(FILE, "< a") || die "Unable to open file : $!\n";

while( chomp($content = <FILE>) ) {
  if( $content =~ m/^H/ ) {
    $headerprint = 0;
    $header = $content;
    %fileHash = ();
  }
  else {
    if( $headerprint == 0 ) {
      print "$header\n"; $headerprint = 1;
    }
    print "$content\n" if exists $fileHash{$content};
    $fileHash{$content} = $i++;
  }
}

exit 0
  #6 (permalink)  
Old 01-16-2007
Krrishv Krrishv is offline
Registered User
  
 

Join Date: Dec 2006
Location: CA,United States
Posts: 186
This shell script should do for you.

#! /usr/bin/ksh
r=`sort $1|uniq -d`
if [ -z "$r" ]; then
echo " No duplicate record found"
else
k=`sort -u $1`
echo "output.txt:" >>outputfile
echo "$k" >> outputfile
exit 0
fi

Last edited by Krrishv; 01-16-2007 at 05:31 AM..
  #7 (permalink)  
Old 01-16-2007
run_eim run_eim is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 5
Quote:
Originally Posted by matrixmadhan
Code:
#! /opt/third-party/bin/perl

my ($content, $i, $header, $headerprint, %fileHash);
open(FILE, "< a") || die "Unable to open file : $!\n";

while( chomp($content = <FILE>) ) {
  if( $content =~ m/^H/ ) {
    $headerprint = 0;
    $header = $content;
    %fileHash = ();
  }
  else {
    if( $headerprint == 0 ) {
      print "$header\n"; $headerprint = 1;
    }
    print "$content\n" if exists $fileHash{$content};
    $fileHash{$content} = $i++;
  }
}

exit 0
matrixmadhan,

Thanks for the script. I REALLY appreciate the help!!

I do not know perl, although I am looking into thanks to the resources on this site. How would I change the code so I could run it taking a file name in as a parameter such as:

>perlcode.pl filename.txt
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 11:29 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0