Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to extract duplicate records with associated header record Post 302102935 by run_eim on Monday 15th of January 2007 08:14:08 AM
Old 01-15-2007
Quote:
Originally Posted by zazzybob
I'd use perl for this...
Code:
$ ./input.pl 
HA
D2
D4
HC
D1
D3
$ cat ./input.pl 
#!/usr/bin/perl
# Script to print headers and duplicate items from input.txt

use warnings;
use strict;
my @records;
undef $/;

open ( INPUT, "< input.txt" ) || die "Couldn't open input file: $!\n";
# use a look-ahead assertion here
@records = split( /^(?=(?:H))/m, <INPUT> );
foreach my $record ( @records ) {
   my @lines = split( /\n/, $record );
   my $header = $lines[0];
   my %linehash;
   my $headerdone = 0;
   foreach my $line ( @lines ) {
      $linehash{$line}++;  
   }
   foreach my $key ( sort ( keys ( %linehash ) ) ) {
      my $value = $linehash{$key};
      if ( $value > 1 ) { 
         if ( $headerdone == 0 ) {
            printf( "%s\n", $header );
            $headerdone++;
         }
         printf( "%s\n", $key );
      }
   }
}
close ( INPUT );

exit ( 0 );

Cheers
ZB
Thanks for the replies.
These is actually multiple files of daily extracts of expense report data from a transactional system. each file is made up of individual expense reports (header records) and the expense line items for each report (detail records). We had a situation where some detail records, but not all, were duplicated. This occurred in some output files, but not all. My requirements are to identify, by export file, the duplicate records, attached to their respective header records. We need this information to send to the system of record to correct these errors. It (hopefully) will be a one time fix. Also, I do not know perl, but am willing to learn enough to use it as a solution.

thanks again for posting a reply.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies

2. UNIX for Dummies Questions & Answers

change order of fields in header record

Hello, after 9 months of archiving 1000 files, now, i need to change the order of fields in the header record. some very large, space padded files. HEADERCAS05212008D0210DOMEST01(spacepadded to record length 210) must now be 05212008HEADERCASD0210DOMEST01(spacepadded to record length 210) ... (1 Reply)
Discussion started by: JohnMario
1 Replies

3. Shell Programming and Scripting

Insertion of Header record

A header record is to be inserted in the begining of a flat file without using extra file or new file. It should be inserted into same file. Advace thanks for all help... (7 Replies)
Discussion started by: shreekrishnagd
7 Replies

4. Shell Programming and Scripting

Specific Header after every 30 records

Hi All, I have got a requirement. I have a source file, EMPFULL.txt and I need to split the data for every 30 records and place a Typical Header as below with system and page number too. 2012.01.03 Employee Dept Report 1... (6 Replies)
Discussion started by: srk409
6 Replies

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

6. Shell Programming and Scripting

Approach on Header record

All, I currently have a requirement to fetch a Date value from a table. And then insert a Header record into a file along with that date value. ex: echo "HDR"" "`date +%Y%j` `date +%Y%m%d` In the above example I used julian date and standard date using Current Date. But the requirement... (0 Replies)
Discussion started by: cmaroju
0 Replies

7. Shell Programming and Scripting

Copy header values into records

I'm using a shell script to manipulate a data file. I have a large file with two sets of data samples (tracking memory consumption) taken over a long period of time, so I have many samples. The problem is that all the data is in the same file so that each sample contains two sets of data.... (2 Replies)
Discussion started by: abercrom
2 Replies

8. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

9. UNIX for Beginners Questions & Answers

Help in printing records where there is a 'header' in the first record ???

Hi, I have a backup report that unfortunately has some kind of hanging indent thing where the first line contains one column more than the others I managed to get the output that I wanted using awk, but just wanting to know if there is short way of doing it using the same awk Below is what... (2 Replies)
Discussion started by: newbie_01
2 Replies

10. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
DUMP(5) 							File Formats Manual							   DUMP(5)

NAME
dump, ddate - incremental dump format SYNOPSIS
#include <sys/types.h> #include <sys/ino.h> # include <dumprestor.h> DESCRIPTION
Tapes used by dump and restor(1) contain: a header record two groups of bit map records a group of records describing directories a group of records describing files The format of the header record and of the first record of each description as given in the include file <dumprestor.h> is: NTREC is the number of 512 byte records in a physical tape block. MLEN is the number of bits in a bit map word. MSIZ is the number of bit map words. The TS_ entries are used in the c_type field to indicate what sort of header this is. The types and their meanings are as follows: TS_TAPE Tape volume label TS_INODE A file or directory follows. The c_dinode field is a copy of the disk inode and contains bits telling what sort of file this is. TS_BITS A bit map follows. This bit map has a one bit for each inode that was dumped. TS_ADDR A subrecord of a file description. See c_addr below. TS_END End of tape record. TS_CLRI A bit map follows. This bit map contains a zero bit for all inodes that were empty on the file system when dumped. MAGIC All header records have this number in c_magic. CHECKSUM Header records checksum to this value. The fields of the header structure are as follows: c_type The type of the header. c_date The date the dump was taken. c_ddate The date the file system was dumped from. c_volume The current volume number of the dump. c_tapea The current number of this (512-byte) record. c_inumber The number of the inode being dumped if this is of type TS_INODE. c_magic This contains the value MAGIC above, truncated as needed. c_checksum This contains whatever value is needed to make the record sum to CHECKSUM. c_dinode This is a copy of the inode as it appears on the file system; see filsys(5). c_count The count of characters in c_addr. c_addr An array of characters describing the blocks of the dumped file. A character is zero if the block associated with that character was not present on the file system, otherwise the character is non-zero. If the block was not present on the file system, no block was dumped; the block will be restored as a hole in the file. If there is not sufficient space in this record to describe all of the blocks in a file, TS_ADDR records will be scattered through the file, each one picking up where the last left off. Each volume except the last ends with a tapemark (read as an end of file). The last volume ends with a TS_END record and then the tape- mark. The structure idates describes an entry of the file /etc/ddate where dump history is kept. The fields of the structure are: id_name The dumped filesystem is `/dev/id_nam'. id_incno The level number of the dump tape; see dump(1). id_ddate The date of the incremental dump in system format see types(5). FILES
/etc/ddate SEE ALSO
dump(1), dumpdir(1), restor(1), filsys(5), types(5) DUMP(5)
All times are GMT -4. The time now is 04:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy