$ ./input.pl
HA
D2
D4
HC
D1
D3
$ cat ./input.pl
#!/usr/bin/perl
# Script to print headers and duplicate items from input.txt
use warnings;
use strict;
my @records;
undef $/;
open ( INPUT, "< input.txt" ) || die "Couldn't open input file: $!\n";
# use a look-ahead assertion here
@records = split( /^(?=(?:H))/m, <INPUT> );
foreach my $record ( @records ) {
my @lines = split( /\n/, $record );
my $header = $lines[0];
my %linehash;
my $headerdone = 0;
foreach my $line ( @lines ) {
$linehash{$line}++;
}
foreach my $key ( sort ( keys ( %linehash ) ) ) {
my $value = $linehash{$key};
if ( $value > 1 ) {
if ( $headerdone == 0 ) {
printf( "%s\n", $header );
$headerdone++;
}
printf( "%s\n", $key );
}
}
}
close ( INPUT );
exit ( 0 );
Cheers
ZB
Thanks for the replies.
These is actually multiple files of daily extracts of expense report data from a transactional system. each file is made up of individual expense reports (header records) and the expense line items for each report (detail records). We had a situation where some detail records, but not all, were duplicated. This occurred in some output files, but not all. My requirements are to identify, by export file, the duplicate records, attached to their respective header records. We need this information to send to the system of record to correct these errors. It (hopefully) will be a one time fix. Also, I do not know perl, but am willing to learn enough to use it as a solution.
I have a flat file and need to count no of records in the file less the header and the trailer record.
I would appreciate any and all asistance
Thanks
Hadi Lalani (2 Replies)
Hello,
after 9 months of archiving 1000 files,
now, i need to change the order of fields in the header record.
some very large, space padded files.
HEADERCAS05212008D0210DOMEST01(spacepadded to record length 210)
must now be
05212008HEADERCASD0210DOMEST01(spacepadded to record length 210)
... (1 Reply)
A header record is to be inserted in the begining of a flat file without using extra file or new file. It should be inserted into same file. Advace thanks for all help... (7 Replies)
Hi All,
I have got a requirement.
I have a source file, EMPFULL.txt and I need to split the data for every 30 records
and place a Typical Header as below with system and page number too.
2012.01.03 Employee Dept Report 1... (6 Replies)
I have 2 files
"File 1" is delimited by ";" and "File 2" is delimited by "|".
File 1 below (3 record shown):
Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones
Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull
Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
All,
I currently have a requirement to fetch a Date value from a table. And then insert a Header record into a file along with that date value.
ex:
echo "HDR"" "`date +%Y%j` `date +%Y%m%d`
In the above example I used julian date and standard date using Current Date. But the requirement... (0 Replies)
I'm using a shell script to manipulate a data file. I have a large file with two sets of data samples (tracking memory consumption) taken over a long period of time, so I have many samples. The problem is that all the data is in the same file so that each sample contains two sets of data.... (2 Replies)
Hi,
I have a backup report that unfortunately has some kind of hanging indent thing where the first line contains one column more than the others
I managed to get the output that I wanted using awk, but just wanting to know if there is short way of doing it using the same awk
Below is what... (2 Replies)
Hi Experts,
I have csv file with 30, 40 columns
Pasting just 2 column for problem description.
Need to print error if below combination is not present in file
check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same.
For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
LEARN ABOUT V7
dump
DUMP(5) File Formats Manual DUMP(5)NAME
dump, ddate - incremental dump format
SYNOPSIS
#include <sys/types.h>
#include <sys/ino.h>
# include <dumprestor.h>
DESCRIPTION
Tapes used by dump and restor(1) contain:
a header record
two groups of bit map records
a group of records describing directories
a group of records describing files
The format of the header record and of the first record of each description as given in the include file <dumprestor.h> is:
NTREC is the number of 512 byte records in a physical tape block. MLEN is the number of bits in a bit map word. MSIZ is the number of bit
map words.
The TS_ entries are used in the c_type field to indicate what sort of header this is. The types and their meanings are as follows:
TS_TAPE Tape volume label
TS_INODE
A file or directory follows. The c_dinode field is a copy of the disk inode and contains bits telling what sort of file this is.
TS_BITS A bit map follows. This bit map has a one bit for each inode that was dumped.
TS_ADDR A subrecord of a file description. See c_addr below.
TS_END End of tape record.
TS_CLRI A bit map follows. This bit map contains a zero bit for all inodes that were empty on the file system when dumped.
MAGIC All header records have this number in c_magic.
CHECKSUM
Header records checksum to this value.
The fields of the header structure are as follows:
c_type The type of the header.
c_date The date the dump was taken.
c_ddate The date the file system was dumped from.
c_volume The current volume number of the dump.
c_tapea The current number of this (512-byte) record.
c_inumber
The number of the inode being dumped if this is of type TS_INODE.
c_magic This contains the value MAGIC above, truncated as needed.
c_checksum
This contains whatever value is needed to make the record sum to CHECKSUM.
c_dinode This is a copy of the inode as it appears on the file system; see filsys(5).
c_count The count of characters in c_addr.
c_addr An array of characters describing the blocks of the dumped file. A character is zero if the block associated with that character
was not present on the file system, otherwise the character is non-zero. If the block was not present on the file system, no
block was dumped; the block will be restored as a hole in the file. If there is not sufficient space in this record to describe
all of the blocks in a file, TS_ADDR records will be scattered through the file, each one picking up where the last left off.
Each volume except the last ends with a tapemark (read as an end of file). The last volume ends with a TS_END record and then the tape-
mark.
The structure idates describes an entry of the file /etc/ddate where dump history is kept. The fields of the structure are:
id_name The dumped filesystem is `/dev/id_nam'.
id_incno The level number of the dump tape; see dump(1).
id_ddate The date of the incremental dump in system format see types(5).
FILES
/etc/ddate
SEE ALSO dump(1), dumpdir(1), restor(1), filsys(5), types(5)DUMP(5)