EBCDIC File Split Based On Record Key


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting EBCDIC File Split Based On Record Key
# 8  
Old 12-24-2015
After reading through the COBOL, it appears that the meaningful data in a record is of variable length, but then each record has a filler at the end of it so that the total byte count comes to 420, plus the 2 bytes for the record for a total of 422 bytes. So when I cut -b 1-2, I get 01. Then when I cut -b 423-424, I get 13.

A small example
Key = 01
Length = 150
Filler = 270

Key = 02
Length = 170
Filler = 250

It seems that the main problem right now is how can I get bytes 1-2 or 423-424, since using cut -b 1-2 gives me the first two bytes of each row of the file, which I definitely do not want.

Ultimately, the C++ program takes an EBCDIC file to convert it to a csv, based on the COBOL structure of that record. This is the reason I need to break it out into the 13 different files, since there are 13 different record types, each with a different COBOL structuring, requiring a different decode algorithm.
# 9  
Old 12-24-2015
OK. So we probably won't need the entire ASCII file.

Does your C++ converter want 422 byte records (with the type, the data, and the filler), or (using type "01" as an example) 152 byte records (with the type, the data, and no filler), or 150 byte records (just the data; no type and no filler)?

And what file do you want to contain the extracted records for each record type? Is Typexx.ebc where xx is the record type OK?

PS... And, of course, we still need the other record types and lengths.

Last edited by Don Cragun; 12-25-2015 at 03:40 AM.. Reason: Add another request for a list of record types and sizes.
# 10  
Old 12-29-2015
Now that I am thinking about it a little more, maybe I can alter the C++ program to read specific bytes at a time, and then depending on the record, read xx bytes and convert that record, then move on. That would be the solution to splitting any of it, as I could just feed the main.ebc file through the program.

To answer a couple of your questions, Typexx.ebc is exactly what I am looking for. So the C++ converter would take Typexx.ebc as the input, do the conversion it needs to based on that record, and then spit out a csv file Typexx.csv that I can then use to load into the database.
# 11  
Old 12-29-2015
Modifying your C++ program would be a LOT faster, but the following seems to work for small a sample data file I created:
Code:
#!/bin/ksh
# Usage: splittype [ EBCDICfile.ebc ]
# The splittype utility shall extract records from the given input file
# (default data.ebc if no operand is given) into files named "Type"xx".ebc"
# (where xx is the record type identified by the 1st two bytes of each 422
# EBCDIC-encoded byte record in the input file).  The input file pathname is
# assumed to end with the extenions ".ebc".  If it doesn't, the results are
# unspecified.  Records found in the input file will be appended to the
# corresponding output files in the current directory.

# If an ASCII version of the input file does not exist in the current directory
# with the same basename as the input file and with the same size as the input
# file with the extension ".ascii", it will be created (or, if it exists with a
# different size, overwritten) before processing starts.
IAm=${0##*/}
ec=0			# Final exit code.
ifEBCDICname=${1:-data.ebc}
ifASCIIname=${ifEBCDICname##*/}
ifASCIIname="${ifASCIIname%.ebc}.ascii"
ofname_prefix="Type"	# Output filenames will be...
ofname_suffix=".ebc"	#	"$ofname_prefix$type$ofname_suffix"

fixlen=522		# Fixed length of records in input file.
spot=0			# # of bytes processed so far from input file.

# Verify that the input file exists...
if [ -f "$ifEBCDICname" ]
then	read junk junk junk junk fsize junk <<-EOF
		$(ls -l "$ifEBCDICname")
	EOF
else	printf '%s: ERROR: File "%s" not found.\n' "$IAm" "$ifEBCDICname" >&2
	exit 1
fi
printf '%s: NOTE: Processing input file "%s" (%d bytes)\n' "$IAm" \
    "$ifEBCDICname" "$fsize" >&2

# Look for ASCII version of input file and create it if needed...
if [ ! -f "$ifASCIIname" ] ||
   [ "$(ls -l "$ifASCIIname" | (read x x x x Afsize x; echo "$Afsize"))" -ne \
	$fsize ]
then	printf '%s: NOTE: Creating ASCII version of "%s"\n' "$IAm" \
	    "$ifEBCDICname" >&2
	if ! dd if="$ifEBCDICname" of="$ifASCIIname" conv=ascii
	then	printf '%s: ERROR: Could not create ASCII file "%s".\n' "$IAm" \
		    "$ifASCIIname" >&2
		exit 2
	fi
fi
while [ $spot -lt $fsize ]
do	type="$(dd if="$ifASCIIname" bs=1 skip=$spot count=2 2>/dev/null)"
	case "$type" in
	(01)	typelen=152;;
	(02)	typelen=172;;
	(*)	printf '%s: Unknown file type ("%s") found at offset %d\n' \
		    "$IAm" "$type" $spot >&2
		spot=$((spot + fixlen))
		ec=3
		continue;;
	esac
	dd if="$ifEBCDICname" bs=1 skip=$spot count=$typelen >> \
	    "$ofname_prefix$type$ofname_suffix" 2>/dev/null
	spot=$((spot + fixlen))
done
exit $ec

Since it invokes dd twice for each record found in your input file, it will be SLOW, but it seems to get the job done. (Of course, you'll have to add the missing record types and assign the correct lengths for the other record types; you only given us the sizes for record types 01 and 02. The code above assumes that you want to include the record type and the data (but not the padding from the end of the input records) in the output files.

Hoping this helps...
# 12  
Old 12-29-2015
This is great Don. I'm going to look through this and make sure I understand each part.

I'll follow up on this after I have had a chance to look through.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

2. UNIX for Advanced & Expert Users

Removing Header and Trailer record of a EBCDIC file

I have a EBCDIC multi layout file which has a header record which is 21 bytes, The Detail records are 2427 bytes long and the trailer record is 9 bytes long. Is there a command to remove the header as well as trailer record and read only the detail records while at the same time not altering... (1 Reply)
Discussion started by: abhilashnair
1 Replies

3. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

5. Shell Programming and Scripting

Split file when the key field change !

Hello, I have the following example data file: Rv.Global_Sk,1077.160523,D,16/09/2011 Rv.Global_Sk,1077.08098,D,17/09/2011 Rv.Global_Sk,1077.001445,D,18/09/2011 Rv.Global_Sk,1072.660733,D,19/09/2011 Rv.Global_Sk,1070.381557,D,20/09/2011 Rv.Global_Sk,1071.971747,D,21/09/2011... (4 Replies)
Discussion started by: csierra
4 Replies

6. Shell Programming and Scripting

split record based on delimiter

Hi, My inputfile contains field separaer is ^. 12^inms^ 13^fakdks^ssk^s3 23^avsd^ 13^fakdks^ssk^a4 I wanted to print only 2 delimiter occurence i.e 12^inms^ 23^avsd^ (4 Replies)
Discussion started by: Jairaj
4 Replies

7. Shell Programming and Scripting

Split a record based on particular match

Hi , I have a requirement to split the record based on particular match using UNIX. Case1: Input Record : 10.44.48.63;"Personals/Dating;sports";1441 Output Records : 10.44.48.63;Personals/Dating;1441;Original 10.44.48.63;sports;1441;Dummy Case2: Input Record : ... (5 Replies)
Discussion started by: mksuneel
5 Replies

8. Shell Programming and Scripting

Split long record into csv file

Hi I receive a mainframe file which has very long records (1100 chars) with no field delimiters. I need to parse each record and output a comma delimited (csv) file. The record layout is fixed. If there weren't so many fields and records I would read the file into Excel, as a "fixed width"... (10 Replies)
Discussion started by: wvdeijk
10 Replies

9. Shell Programming and Scripting

How to split a file record

-Hi, I have a problem with parcing/spliting a file record into two parts and assigning the split parts to two viriables. The record is as follows: ftrn facc ttrd feed xref fsdb fcp ruldb csdb omom fordr ftxn fodb fsdc texc oxox reng ttrn ttxn fqdb ... (5 Replies)
Discussion started by: aoussenko
5 Replies

10. UNIX for Dummies Questions & Answers

How to count the record count in an EBCDIC file.

How do I get the record count in an EBCDIC file on a Linux Box. :confused: (1 Reply)
Discussion started by: oracle8
1 Replies
Login or Register to Ask a Question