EBCDIC File Split Based On Record Key

12-24-2015

Registered User

6, 0

Join Date: Dec 2015

Last Activity: 29 December 2015, 6:17 PM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

After reading through the COBOL, it appears that the meaningful data in a record is of variable length, but then each record has a filler at the end of it so that the total byte count comes to 420, plus the 2 bytes for the record for a total of 422 bytes. So when I cut -b 1-2, I get 01. Then when I cut -b 423-424, I get 13.

A small example
Key = 01
Length = 150
Filler = 270

Key = 02
Length = 170
Filler = 250

It seems that the main problem right now is how can I get bytes 1-2 or 423-424, since using cut -b 1-2 gives me the first two bytes of each row of the file, which I definitely do not want.

Ultimately, the C++ program takes an EBCDIC file to convert it to a csv, based on the COBOL structure of that record. This is the reason I need to break it out into the 13 different files, since there are 13 different record types, each with a different COBOL structuring, requiring a different decode algorithm.

hanshot1stx

View Public Profile for hanshot1stx

Find all posts by hanshot1stx

12-24-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

OK. So we probably won't need the entire ASCII file.

Does your C++ converter want 422 byte records (with the type, the data, and the filler), or (using type "01" as an example) 152 byte records (with the type, the data, and no filler), or 150 byte records (just the data; no type and no filler)?

And what file do you want to contain the extracted records for each record type? Is Typexx.ebc where xx is the record type OK?

PS... And, of course, we still need the other record types and lengths.

Last edited by Don Cragun; 12-25-2015 at 03:40 AM.. Reason: Add another request for a list of record types and sizes.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-29-2015

Registered User

6, 0

Join Date: Dec 2015

Last Activity: 29 December 2015, 6:17 PM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Now that I am thinking about it a little more, maybe I can alter the C++ program to read specific bytes at a time, and then depending on the record, read xx bytes and convert that record, then move on. That would be the solution to splitting any of it, as I could just feed the main.ebc file through the program.

To answer a couple of your questions, Typexx.ebc is exactly what I am looking for. So the C++ converter would take Typexx.ebc as the input, do the conversion it needs to based on that record, and then spit out a csv file Typexx.csv that I can then use to load into the database.

hanshot1stx

View Public Profile for hanshot1stx

Find all posts by hanshot1stx

12-29-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Modifying your C++ program would be a LOT faster, but the following seems to work for small a sample data file I created:

Code:

#!/bin/ksh
# Usage: splittype [ EBCDICfile.ebc ]
# The splittype utility shall extract records from the given input file
# (default data.ebc if no operand is given) into files named "Type"xx".ebc"
# (where xx is the record type identified by the 1st two bytes of each 422
# EBCDIC-encoded byte record in the input file).  The input file pathname is
# assumed to end with the extenions ".ebc".  If it doesn't, the results are
# unspecified.  Records found in the input file will be appended to the
# corresponding output files in the current directory.

# If an ASCII version of the input file does not exist in the current directory
# with the same basename as the input file and with the same size as the input
# file with the extension ".ascii", it will be created (or, if it exists with a
# different size, overwritten) before processing starts.
IAm=${0##*/}
ec=0			# Final exit code.
ifEBCDICname=${1:-data.ebc}
ifASCIIname=${ifEBCDICname##*/}
ifASCIIname="${ifASCIIname%.ebc}.ascii"
ofname_prefix="Type"	# Output filenames will be...
ofname_suffix=".ebc"	#	"$ofname_prefix$type$ofname_suffix"

fixlen=522		# Fixed length of records in input file.
spot=0			# # of bytes processed so far from input file.

# Verify that the input file exists...
if [ -f "$ifEBCDICname" ]
then	read junk junk junk junk fsize junk <<-EOF
		$(ls -l "$ifEBCDICname")
	EOF
else	printf '%s: ERROR: File "%s" not found.\n' "$IAm" "$ifEBCDICname" >&2
	exit 1
fi
printf '%s: NOTE: Processing input file "%s" (%d bytes)\n' "$IAm" \
    "$ifEBCDICname" "$fsize" >&2

# Look for ASCII version of input file and create it if needed...
if [ ! -f "$ifASCIIname" ] ||
   [ "$(ls -l "$ifASCIIname" | (read x x x x Afsize x; echo "$Afsize"))" -ne \
	$fsize ]
then	printf '%s: NOTE: Creating ASCII version of "%s"\n' "$IAm" \
	    "$ifEBCDICname" >&2
	if ! dd if="$ifEBCDICname" of="$ifASCIIname" conv=ascii
	then	printf '%s: ERROR: Could not create ASCII file "%s".\n' "$IAm" \
		    "$ifASCIIname" >&2
		exit 2
	fi
fi
while [ $spot -lt $fsize ]
do	type="$(dd if="$ifASCIIname" bs=1 skip=$spot count=2 2>/dev/null)"
	case "$type" in
	(01)	typelen=152;;
	(02)	typelen=172;;
	(*)	printf '%s: Unknown file type ("%s") found at offset %d\n' \
		    "$IAm" "$type" $spot >&2
		spot=$((spot + fixlen))
		ec=3
		continue;;
	esac
	dd if="$ifEBCDICname" bs=1 skip=$spot count=$typelen >> \
	    "$ofname_prefix$type$ofname_suffix" 2>/dev/null
	spot=$((spot + fixlen))
done
exit $ec

Since it invokes dd twice for each record found in your input file, it will be SLOW, but it seems to get the job done. (Of course, you'll have to add the missing record types and assign the correct lengths for the other record types; you only given us the sizes for record types 01 and 02. The code above assumes that you want to include the record type and the data (but not the padding from the end of the input records) in the output files.

Hoping this helps...

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-29-2015

Registered User

6, 0

Join Date: Dec 2015

Last Activity: 29 December 2015, 6:17 PM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

This is great Don. I'm going to look through this and make sure I understand each part.

I'll follow up on this after I have had a chance to look through.

hanshot1stx

View Public Profile for hanshot1stx

Find all posts by hanshot1stx

Shell Programming and Scripting

EBCDIC File Split Based On Record Key

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Discussion started by: Ravi.K

2. UNIX for Advanced & Expert Users

Removing Header and Trailer record of a EBCDIC file

Discussion started by: abhilashnair

3. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Discussion started by: ibmtech

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Discussion started by: lathigara

5. Shell Programming and Scripting

Split file when the key field change !

Discussion started by: csierra

6. Shell Programming and Scripting

split record based on delimiter

Discussion started by: Jairaj

7. Shell Programming and Scripting

Split a record based on particular match

Discussion started by: mksuneel

8. Shell Programming and Scripting

Split long record into csv file

Discussion started by: wvdeijk

9. Shell Programming and Scripting

How to split a file record

Discussion started by: aoussenko

10. UNIX for Dummies Questions & Answers

How to count the record count in an EBCDIC file.

Discussion started by: oracle8