Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to extract duplicate records with associated header record Post 302102935 by run_eim on Monday 15th of January 2007 08:14:08 AM
Old 01-15-2007
Quote:
Originally Posted by zazzybob
I'd use perl for this...
Code:
$ ./input.pl 
HA
D2
D4
HC
D1
D3
$ cat ./input.pl 
#!/usr/bin/perl
# Script to print headers and duplicate items from input.txt

use warnings;
use strict;
my @records;
undef $/;

open ( INPUT, "< input.txt" ) || die "Couldn't open input file: $!\n";
# use a look-ahead assertion here
@records = split( /^(?=(?:H))/m, <INPUT> );
foreach my $record ( @records ) {
   my @lines = split( /\n/, $record );
   my $header = $lines[0];
   my %linehash;
   my $headerdone = 0;
   foreach my $line ( @lines ) {
      $linehash{$line}++;  
   }
   foreach my $key ( sort ( keys ( %linehash ) ) ) {
      my $value = $linehash{$key};
      if ( $value > 1 ) { 
         if ( $headerdone == 0 ) {
            printf( "%s\n", $header );
            $headerdone++;
         }
         printf( "%s\n", $key );
      }
   }
}
close ( INPUT );

exit ( 0 );

Cheers
ZB
Thanks for the replies.
These is actually multiple files of daily extracts of expense report data from a transactional system. each file is made up of individual expense reports (header records) and the expense line items for each report (detail records). We had a situation where some detail records, but not all, were duplicated. This occurred in some output files, but not all. My requirements are to identify, by export file, the duplicate records, attached to their respective header records. We need this information to send to the system of record to correct these errors. It (hopefully) will be a one time fix. Also, I do not know perl, but am willing to learn enough to use it as a solution.

thanks again for posting a reply.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies

2. UNIX for Dummies Questions & Answers

change order of fields in header record

Hello, after 9 months of archiving 1000 files, now, i need to change the order of fields in the header record. some very large, space padded files. HEADERCAS05212008D0210DOMEST01(spacepadded to record length 210) must now be 05212008HEADERCASD0210DOMEST01(spacepadded to record length 210) ... (1 Reply)
Discussion started by: JohnMario
1 Replies

3. Shell Programming and Scripting

Insertion of Header record

A header record is to be inserted in the begining of a flat file without using extra file or new file. It should be inserted into same file. Advace thanks for all help... (7 Replies)
Discussion started by: shreekrishnagd
7 Replies

4. Shell Programming and Scripting

Specific Header after every 30 records

Hi All, I have got a requirement. I have a source file, EMPFULL.txt and I need to split the data for every 30 records and place a Typical Header as below with system and page number too. 2012.01.03 Employee Dept Report 1... (6 Replies)
Discussion started by: srk409
6 Replies

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

6. Shell Programming and Scripting

Approach on Header record

All, I currently have a requirement to fetch a Date value from a table. And then insert a Header record into a file along with that date value. ex: echo "HDR"" "`date +%Y%j` `date +%Y%m%d` In the above example I used julian date and standard date using Current Date. But the requirement... (0 Replies)
Discussion started by: cmaroju
0 Replies

7. Shell Programming and Scripting

Copy header values into records

I'm using a shell script to manipulate a data file. I have a large file with two sets of data samples (tracking memory consumption) taken over a long period of time, so I have many samples. The problem is that all the data is in the same file so that each sample contains two sets of data.... (2 Replies)
Discussion started by: abercrom
2 Replies

8. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

9. UNIX for Beginners Questions & Answers

Help in printing records where there is a 'header' in the first record ???

Hi, I have a backup report that unfortunately has some kind of hanging indent thing where the first line contains one column more than the others I managed to get the output that I wanted using awk, but just wanting to know if there is short way of doing it using the same awk Below is what... (2 Replies)
Discussion started by: newbie_01
2 Replies

10. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
acctcon(1M)															       acctcon(1M)

NAME
acctcon, acctcon1, acctcon2 - connect-time accounting SYNOPSIS
[options] [options] DESCRIPTION
The command converts a sequence of login/logoff records read from its standard input to a sequence of records, one per login session. Its input should normally be redirected from or Its output is ASCII, giving device, user ID, login name, prime connect time (seconds), non- prime connect time (seconds), session starting time (numeric), and starting date and time. Prime connect time is defined as the connect time within a specific prime period on a non-holiday weekday (Monday through Friday). The starting and ending time of the prime period and the year's holidays are defined in file expects as input a sequence of login session records, produced by and converts them into total accounting records (see format in acct(4)). combines the functionality of and into one program. It takes the same input format as and writes the same output as recognizes the following options: Print input only, showing line name, login name, and time (in both numeric and date/time formats). maintains a list of lines on which users are logged in. When it reaches the end of its input, it emits a session record for each line that still appears to be active. It normally assumes that its input is a current file, so that it uses the current time as the ending time for each ses- sion still in progress. The flag causes it to use, instead, the last time found in its input, thus ensuring reason- able and repeatable numbers for non-current files. and recognize the following options: file is created to contain a summary of line usage showing line name, number of minutes used, percentage of total elapsed time used, number of sessions charged, number of logins, and number of logoffs. This file helps track line usage, identify bad lines, and find software and hardware oddities. Hang-up, termination of (see login(1)), and termination of the login shell each generate logoff records, so that the number of logoffs is often three to four times the num- ber of sessions. See init(1M) and utmp(4). file is filled with an overall record for the accounting period, giving starting time, ending time, number of reboots, and number of date changes. When this option is used, the records of the type found in are read from the specified input. EXAMPLES
These commands are typically used as shown below. The file is created only for the use of commands described by the acctprc(1M) manual entry: or With option: or Note: The file can be either or a file containing records of the type found in WARNINGS
The line usage report is confused by date changes. Use (see fwtmp(1M)) to correct this situation. FILES
SEE ALSO
login(1), acct(1M), acctcms(1M), acctcom(1M), acctmerg(1M), acctprc(1M), acctsh(1M), fwtmp(1M), init(1M), utmpd(1M), runacct(1M), acct(2), getbwent(3C), acct(4), utmp(4). STANDARDS CONFORMANCE
acctcon(1M)
All times are GMT -4. The time now is 09:56 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy