How to extract duplicate records with associated header record Post: 302102935

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers How to extract duplicate records with associated header record Post 302102935 by run_eim on Monday 15th of January 2007 08:14:08 AM

01-15-2007

Registered User

Quote:

Originally Posted by zazzybob

I'd use perl for this...

Code:

$ ./input.pl 
HA
D2
D4
HC
D1
D3
$ cat ./input.pl 
#!/usr/bin/perl
# Script to print headers and duplicate items from input.txt

use warnings;
use strict;
my @records;
undef $/;

open ( INPUT, "< input.txt" ) || die "Couldn't open input file: $!\n";
# use a look-ahead assertion here
@records = split( /^(?=(?:H))/m, <INPUT> );
foreach my $record ( @records ) {
   my @lines = split( /\n/, $record );
   my $header = $lines[0];
   my %linehash;
   my $headerdone = 0;
   foreach my $line ( @lines ) {
      $linehash{$line}++;  
   }
   foreach my $key ( sort ( keys ( %linehash ) ) ) {
      my $value = $linehash{$key};
      if ( $value > 1 ) { 
         if ( $headerdone == 0 ) {
            printf( "%s\n", $header );
            $headerdone++;
         }
         printf( "%s\n", $key );
      }
   }
}
close ( INPUT );

exit ( 0 );

Cheers
ZB

Thanks for the replies.
These is actually multiple files of daily extracts of expense report data from a transactional system. each file is made up of individual expense reports (header records) and the expense line items for each report (detail records). We had a situation where some detail records, but not all, were duplicated. This occurred in some output files, but not all. My requirements are to identify, by export file, the duplicate records, attached to their respective header records. We need this information to send to the system of record to correct these errors. It (hopefully) will be a one time fix. Also, I do not know perl, but am willing to learn enough to use it as a solution.

thanks again for posting a reply.

run_eim

View Public Profile for run_eim

Find all posts by run_eim

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani

2. UNIX for Dummies Questions & Answers

change order of fields in header record

Hello, after 9 months of archiving 1000 files, now, i need to change the order of fields in the header record. some very large, space padded files. HEADERCAS05212008D0210DOMEST01(spacepadded to record length 210) must now be 05212008HEADERCASD0210DOMEST01(spacepadded to record length 210) ...

3. Shell Programming and Scripting

Insertion of Header record

A header record is to be inserted in the begining of a flat file without using extra file or new file. It should be inserted into same file. Advace thanks for all help...

4. Shell Programming and Scripting

Specific Header after every 30 records

Hi All, I have got a requirement. I have a source file, EMPFULL.txt and I need to split the data for every 30 records and place a Typical Header as below with system and page number too. 2012.01.03 Employee Dept Report 1...

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles...

6. Shell Programming and Scripting

Approach on Header record

All, I currently have a requirement to fetch a Date value from a table. And then insert a Header record into a file along with that date value. ex: echo "HDR"" "`date +%Y%j` `date +%Y%m%d` In the above example I used julian date and standard date using Current Date. But the requirement...

7. Shell Programming and Scripting

Copy header values into records

I'm using a shell script to manipulate a data file. I have a large file with two sets of data samples (tracking memory consumption) taken over a long period of time, so I have many samples. The problem is that all the data is in the same file so that each sample contains two sets of data....

8. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14...

9. UNIX for Beginners Questions & Answers

Help in printing records where there is a 'header' in the first record ???

Hi, I have a backup report that unfortunately has some kind of hanging indent thing where the first line contains one column more than the others I managed to get the output that I wanted using awk, but just wanting to know if there is short way of doing it using the same awk Below is what...

10. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field...

LEARN ABOUT V7

dump

DUMP(5) 							File Formats Manual							   DUMP(5)

NAME

       dump, ddate - incremental dump format

SYNOPSIS

       #include <sys/types.h>
       #include <sys/ino.h>
       # include <dumprestor.h>

DESCRIPTION

       Tapes used by dump and restor(1) contain:

	      a header record
	      two groups of bit map records
	      a group of records describing directories
	      a group of records describing files

       The format of the header record and of the first record of each description as given in the include file <dumprestor.h> is:

       NTREC is the number of 512 byte records in a physical tape block.  MLEN is the number of bits in a bit map word.  MSIZ is the number of bit
       map words.

       The TS_ entries are used in the c_type field to indicate what sort of header this is.  The types and their meanings are as follows:

       TS_TAPE Tape volume label
       TS_INODE
	       A file or directory follows.  The c_dinode field is a copy of the disk inode and contains bits telling what sort of file this is.
       TS_BITS A bit map follows.  This bit map has a one bit for each inode that was dumped.
       TS_ADDR A subrecord of a file description.  See c_addr below.
       TS_END  End of tape record.
       TS_CLRI A bit map follows.  This bit map contains a zero bit for all inodes that were empty on the file system when dumped.
       MAGIC   All header records have this number in c_magic.
       CHECKSUM
	       Header records checksum to this value.

       The fields of the header structure are as follows:

       c_type	The type of the header.
       c_date	The date the dump was taken.
       c_ddate	The date the file system was dumped from.
       c_volume The current volume number of the dump.
       c_tapea	The current number of this (512-byte) record.
       c_inumber
		The number of the inode being dumped if this is of type TS_INODE.
       c_magic	This contains the value MAGIC above, truncated as needed.
       c_checksum
		This contains whatever value is needed to make the record sum to CHECKSUM.
       c_dinode This is a copy of the inode as it appears on the file system; see filsys(5).
       c_count	The count of characters in c_addr.
       c_addr	An array of characters describing the blocks of the dumped file.  A character is zero if the block associated with that  character
		was  not  present  on  the  file system, otherwise the character is non-zero.  If the block was not present on the file system, no
		block was dumped; the block will be restored as a hole in the file.  If there is not sufficient space in this record  to  describe
		all of the blocks in a file, TS_ADDR records will be scattered through the file, each one picking up where the last left off.

       Each  volume  except  the last ends with a tapemark (read as an end of file).  The last volume ends with a TS_END record and then the tape-
       mark.

       The structure idates describes an entry of the file /etc/ddate where dump history is kept.  The fields of the structure are:

       id_name	The dumped filesystem is `/dev/id_nam'.
       id_incno The level number of the dump tape; see dump(1).
       id_ddate The date of the incremental dump in system format see types(5).

FILES

       /etc/ddate

SEE ALSO

       dump(1), dumpdir(1), restor(1), filsys(5), types(5)

																	   DUMP(5)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

Discussion started by: guiguy

2. UNIX for Dummies Questions & Answers

change order of fields in header record

Discussion started by: JohnMario

3. Shell Programming and Scripting

Insertion of Header record

Discussion started by: shreekrishnagd

4. Shell Programming and Scripting

Specific Header after every 30 records

Discussion started by: srk409