![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| change order of fields in header record | JohnMario | UNIX for Dummies Questions & Answers | 1 | 05-22-2008 02:58 PM |
| awk script to update header record | klut | Shell Programming and Scripting | 5 | 04-16-2008 09:04 AM |
| Parsing records from one record | bwrynz1 | UNIX for Advanced & Expert Users | 1 | 03-10-2008 10:54 AM |
| Count No of Records in File without counting Header and Trailer Records | guiguy | Shell Programming and Scripting | 2 | 06-07-2007 12:15 PM |
| Records Duplicate | ganesh123 | Shell Programming and Scripting | 9 | 02-22-2007 08:47 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
How to extract duplicate records with associated header record
All,
I have a task to search through several hundred files and extract duplicate detail records and keep them grouped with their header record. If no duplicate detail record exists, don't pull the header. For example, an input file could look like this: input.txt HA D1 D2 D2 D3 D4 D4 HB D1 D2 HC D1 D1 D2 D3 D3 The output would be: output.txt HA D2 D4 HC D1 D3 Would it be possible to do this with AWK? I do not know python. Thank you for your time. |
|
||||
|
Quote:
These is actually multiple files of daily extracts of expense report data from a transactional system. each file is made up of individual expense reports (header records) and the expense line items for each report (detail records). We had a situation where some detail records, but not all, were duplicated. This occurred in some output files, but not all. My requirements are to identify, by export file, the duplicate records, attached to their respective header records. We need this information to send to the system of record to correct these errors. It (hopefully) will be a one time fix. Also, I do not know perl, but am willing to learn enough to use it as a solution. thanks again for posting a reply. |
|
||||
|
Code:
#! /opt/third-party/bin/perl
my ($content, $i, $header, $headerprint, %fileHash);
open(FILE, "< a") || die "Unable to open file : $!\n";
while( chomp($content = <FILE>) ) {
if( $content =~ m/^H/ ) {
$headerprint = 0;
$header = $content;
%fileHash = ();
}
else {
if( $headerprint == 0 ) {
print "$header\n"; $headerprint = 1;
}
print "$content\n" if exists $fileHash{$content};
$fileHash{$content} = $i++;
}
}
exit 0
|
|
||||
|
This shell script should do for you.
#! /usr/bin/ksh r=`sort $1|uniq -d` if [ -z "$r" ]; then echo " No duplicate record found" else k=`sort -u $1` echo "output.txt:" >>outputfile echo "$k" >> outputfile exit 0 fi Last edited by Krrishv; 01-16-2007 at 05:31 AM.. |
|
||||
|
Quote:
Thanks for the script. I REALLY appreciate the help!! I do not know perl, although I am looking into thanks to the resources on this site. How would I change the code so I could run it taking a file name in as a parameter such as: >perlcode.pl filename.txt |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|