Merge text files while combining the multiple header/trailer records into one each.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merge text files while combining the multiple header/trailer records into one each.
# 1  
Old 11-17-2008
Merge text files while combining the multiple header/trailer records into one each.

Situation:
Our system currently executes a job (COBOL Program) that generates an interface file to be sent to one of our vendors. Because this system processes information for over 100,000 employees/retirees (and growing), we'd like to multi-thread the job into processing-groups in order to reduce its run-time. This works fine, however, we're faced with multiple interface files that need to be merged prior transferring to the vendor.


Some Details on the File:
The file generated has a header and a trailer record, and the trailer record has pertinent total values (i.e., employee count, records approved, etc). There are no field separators -- these are fixed length fields.

Predicament in Detail:
We'd like to concatenate the files -- that's the easy part. What makes this difficult is that we need to eliminate the multiple header records and retain only the first one. Also, we need to eliminate the multiple trailer records, but we need to add all the value totals from each trailer into the one trailer record we'll retain at the end.

As you might have surmised by now, I've written some UNIX scripts, but lack some key knowledge related to individual record and field manipulation within a text file. In particular, I'd like to know how I can define specific fields when I read each record -- these are the fields for the trailer records I need to keep a rolling total on. Also, I'd like to know how I can delete individual records.

Any assistance will be greatly appreciated.
# 2  
Old 11-17-2008
You did not give enough information to build a correct script
We need a sample header line a sample data line and a sample trailer line.
# 3  
Old 11-17-2008
Sample File

Quote:
Originally Posted by jim mcnamara
You did not give enough information to build a correct script
We need a sample header line a sample data line and a sample trailer line.
Sorry about that! Here's a sample file -- incomplete records, though, as they're rather large. But the pertinent information is contained.


BATCH HEADER PRO 0724200808042008
01E000036841 LEAD05151948F 51498 10012007 YYY
02E000036841 ME 04161988F 10012007
01E000060640 MDGV12251951F 51498 1001200709302008YYY
02E000060640 RD 05061941M 1001200709302008
01E000025850 LDUO06081956F 51498 1001200709302008YYY
02E000025850 ED 10071937M 1001200709302008
01E029009859 DUA05021960F 51498 10012007 YYY
02E029009859 LD 03101989F 10012007
02E029009859 LD 02041997M 10012007
01E034008379 AEUA09181965F 51498 10012007 YYY
02E034008379 NE 11131991F 10012007
02E034008379 RE 01131993F 10012007
02E034008379 EE 09191959M 10012007
01E045005523 EUA02131964M 51498 10012007 YNN
01E046004280 DUA12041947M 51498 10012007 YYY
02E046004280 D 12121953F 10012007
02E046004280 KE 09211986M 10012007
01E048005119 BDUA01301961F 51498 10012007 YNN
01E055002147 LDUA10011964F 51498 10012007 YYY
02E055002147 RD 11121966M 10012007
02E055002147 ND 02131997F 10012007
02E055002147 JD 03111992M 10012007
01E057008796 SEUA12061975F 51498 10012007 YYY
BATCH TRAILER 000001150000019908042008

Details on the Trailer Record: the 00000115 is a total value (number of employees), the 00000199 is the total of records processed (employees and dependents). Those two fields I'd need to keep a rolling total for all the files we merge.

The detail records are over 300 characters wide (irrelevant for what we need to do, but thought I include it).

Thank you!
# 4  
Old 11-17-2008
assuming this: 01E000036841 is an employee id and the files are named <something>.dat
Code:
ls *.dat | read header dummy
# save copies of header
head -1 $header > tmp

awk '{ if (index($0, "HEADER") > 0 || index($0, "TRAILER") >0 ) {last= $0; continue}
       arr[$0]++; print $0   }
       END { for (i in arr) 
             {
               empcnt++ 
               lc+=arr[i]
             } 
             print empcnt, lc > "cntfile" }  ' *.dat >> tmp
awk ' { rec=sprintf("%08d%08d", $1 $2)}
         END { printf("BATCH TRAILER %s%s\n", rec, substr(last, length(last)-7) } ' cntfile >> tmp
mv tmp employee.dat

This also assumes the last eight characters of BATCH TRAILER are all the same.
# 5  
Old 11-17-2008
hi below perl may help you a little

usage: perl a.pl NUM FILE1 FILE2 [here NUM indicate how many lines will be header]
Code:
a:
*****
line 1
line 2
1 2 3 4 5

Code:
b:
*****
line 3
line 4
9 8 7 6 5

output:
Code:
*****
line 1
line 2
line 3
line 4
10 10 10 10 10

Code:
$header=shift;
undef $/;
my(@head,@body,@foot);
while($file=shift){
	open FH,"<$file" or die "Can not open file $_";
	my $str=<FH>;
	close FH;
	my @temp=split("\n",$str);		
	for( my $i=0;$i<$header;$i++){
		push @head,$temp[$i] if ($#head<$header-1);
	}
	for(my $j=$header;$j<$#temp;$j++){
		push @body,$temp[$j];
	}
	my @footer = split(" ",$temp[$#temp]);
	for($k=0;$k<=$#footer;$k++){
		$foot[$k]=$foot[$k]+$footer[$k];
	}
}
print join "\n",@head;
print "\n",join "\n",@body;
print "\n",join " ",@foot;

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Append pipe | at the end of all the rows except header n trailer for all the files under a directory

Hi Experts Need help... I am looking for a Unix script to append pipe | at the end of all the rows (except header and trailer)in all the files placed under the directory /interfaces/Temp e.g. Header row1 row2 row3 Trailer The script should read all the files under... (3 Replies)
Discussion started by: phani333
3 Replies

2. Shell Programming and Scripting

Merge multiple files with common header

Hi all, Say i have multiple files x1 x2 x3 x4, all with common header (date, time, year, age),, How can I merge them to one singe file "X" in shell scripting Thanks for your suggestions. (2 Replies)
Discussion started by: msarguru
2 Replies

3. Shell Programming and Scripting

Need to merge multiple text files vertically and place comma between fields

Hello expert friends, I'm writing a script to capture stats using sar and stuck up at report generation. I have around 10 files in a directory and need to merge them all vertically based on the time value of first column (output file should have only one time value) and insert comma after... (6 Replies)
Discussion started by: prvnrk
6 Replies

4. Shell Programming and Scripting

Merge the multiple text files into one file

Hi All, I am trying to merge all the text files into one file using below snippet cat /home/Temp/Test/Log/*.txt >> all.txt But it seems it is not working. I have multiple files like Output_ServerName1.txt, Output_ServreName2.txt I want to merge each file into one single file and... (6 Replies)
Discussion started by: sharsour
6 Replies

5. UNIX for Dummies Questions & Answers

Need help combining txt files w/ multiple lines into csv single cell - also need data merge

:confused:Hello -- i just joined the forums. I am a complete noob -- only about 1 week into learning how to program anything... and starting with linux. I am working in Linux terminal. I have a folder with a bunch of txt files. Each file has several lines of html code. I want to combine... (2 Replies)
Discussion started by: jetsetter
2 Replies

6. UNIX for Dummies Questions & Answers

Merge all csv files in one folder considering only 1 header row and ignoring header of all others

Friends, I need help with the following in UNIX. Merge all csv files in one folder considering only 1 header row and ignoring header of all other files. FYI - All files are in same format and contains same headers. Thank you (4 Replies)
Discussion started by: Shiny_Roy
4 Replies

7. Shell Programming and Scripting

Adding Header and Trailer records to a appended file

How can we a shell script and pass date parameters .I have 3 files comming from Datastage with |" delimited I need append 3 files as above: File1: P0000|"47416954|"AU|"000|"INS|"0000|"|"20060601|"99991231|"|"|"|"|"01 File 2:... (2 Replies)
Discussion started by: e1994264
2 Replies

8. Shell Programming and Scripting

improve performance - replace $\| with $#@ and remove header and trailer records

Hi All, In my file i need to remove header and trailer records which comes in 1st line and last line respectively. After that i need to replace '$\|' with '$#@'. I am using sed command for this and its taking lot of time. Is there any other command which can be used to improve performance? ... (1 Reply)
Discussion started by: HemaV
1 Replies

9. UNIX for Dummies Questions & Answers

Copy all the files with time stamp and remove header,trailer from file

All, I am new to unix and i have the following requirement. I have file(s) landing into input directory with timestamp, first i want to copy all these files into seperate directory then i want to rename these files without timestamp and also remove header,trailer from that file.. Could... (35 Replies)
Discussion started by: ksrams
35 Replies

10. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies
Login or Register to Ask a Question