EBCDIC File Split Based On Record Key

12-21-2015

Registered User

6, 0

Join Date: Dec 2015

Last Activity: 29 December 2015, 6:17 PM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

EBCDIC File Split Based On Record Key

I was wondering if anyone could explain to me how to split a variable length EBCDIC file into seperate files based on the record key. I have the COBOL layout, and so I need to split the file into 13 different EBCDIC files so that I can run each one through a C++ converter I have, and get the corresponding csv output file to put into a database. The records are:

Record Key Segment Name
01 GRROOT
02 GRCYCLE
. .
. .
. .
13 GRR3RMKS

If it helps at all, the PDF that comes with the EBCDIC file showing the COBOL layout states that the record length of the file is 422, the blocking factor is 77 and the blocksize is 32,494. There is additional information such as GRROOT length is 150 bytes, GRCYCLE is 72 bytes, etc.

Thanks for the help

hanshot1stx

View Public Profile for hanshot1stx

Find all posts by hanshot1stx

12-22-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

There is an awful lot that is unspecified here:

What operating system are you using?
Is there any binary data in the COBOL files you're processing, or is it all text.
Is the record key the entire 422 byte record? If not what part of the record constitutes the key?
Why do you say the input is variable length and then say that the record length is 422 bytes per record and 77 records per block? What is variable other than the number of records in the file?

If you're trying to process EBCDIC files on an ASCII based system, the dd utility will probably be at the base of your processing. Look at the dd man page on your system and see if something like the following would be a good start to getting a file you can then split awk or grep:

Code:

dd if=YourEBCDICInputFileName of=YourASCIIOutputFileName ibs=422x77 cbs=422 conv=ascii,unblock,sync

and, after splitting YourASCIIOutputFileName into the files you want based on your keys, you could convert them back into fixed-length, blocked, EBCDIC files using dd again with obs=422x77 instead of ibs=422x77 and conv=ebcdic,block... and appropriate if= and of= parameters.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-22-2015

Registered User

6, 0

Join Date: Dec 2015

Last Activity: 29 December 2015, 6:17 PM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks for the reply Don. I am doing this for a little side work project. Here are some of the specifics:

1) Any OS. I have a machine that runs ubuntu, and my work computer is windows 10. It sounds like ubunutu would be my preference here.
2) There is binary data that is being processed, packed decimal fields if that sounds right.
3) Reading through the cobol the record key is the first two bytes (1,2) of each record
4) The 422 and 77 bytes were numbers that appear in the front of the PDF, but then later it says that each record is of variable length, and gives me the length of each record. The total number of records would change each month, since this is a monthly dataset.

As I am typing this, it sounds like I would need to use the dd command and be able to change the number of bytes that is read each time based on what the record key is. So lets say I use dd and I want to read the first two bytes. If the ASCII conversion of those bytes = 01, then I know that the record length is 150 bytes, so I want to read the 150 and write them to a new EBCDIC file, that will later be sent through a program that unpacks the fields and converts to a csv. Then I would want to skip 150 bytes and read the next two bytes. Lets say those = 02, so I know that the record is 72 bytes. So on and so forth

hanshot1stx

View Public Profile for hanshot1stx

Find all posts by hanshot1stx

12-22-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You can convert the entire EBCDIC file to ASCII just using:

Code:

dd if=EBCDICfile of=ASCIIfile conv=ascii

but, with packed decimal fields in the COBOL input file, you may end up with null bytes in the output file. And, you can't have null bytes in a text file. If you aren't working on text files, many of the standard utilities produce undefined results.

But, you can use cut and paste even if the files being processed are not text files. So, after converting your file to ASCII, you could walk through the file in a loop, starting with offset=1:

grabbing two bytes (with cut) to determine record type,
based on the type, grab x more bytes (again with cut) to complete the record you started reading,
writing the complete x+2 byte record to the appropriate output file (again, based on record type), and
incrementing offset by x+2.

As long as you don't modify the parts of the record that contain packed decimal data, converting back from ASCII to EBCDIC should still have correct packed decimal data in the resulting EBCDIC file.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-22-2015

Registered User

2,202, 340

Join Date: Apr 2007

Last Activity: 10 May 2020, 8:59 AM EDT

Location: 44.21.48N 80.50.15W

Posts: 2,202

Thanks Given: 3

Thanked 340 Times in 306 Posts

why not install gnucobol and write a cobol program.

jgt

View Public Profile for jgt

Visit jgt's homepage!

Find all posts by jgt

12-24-2015

Registered User

6, 0

Join Date: Dec 2015

Last Activity: 29 December 2015, 6:17 PM EST

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Alright Don, so I have spent the last few days playing with this now and have run into a couple quirks. First off are some things about the file. We will call the original EBCDIC file with all of the data data.ebc. I go ahead and do the simple conversion using dd to get a new file, data.ascii. Running the wc command gives me

0 lines in data.ebc, with 64454170 bytes
5948 lines in data.ascii with 64454170 bytes

Then I use the tr command to get rid of newlines so that I have one line in my new.ascii file. Then I go through new.ascii and cut the first two bytes, get 01, and write that to a file, increment and repeat. This works perfectly until I get to bytes 16880, in which the program then gets thrown off. Interestingly in data.ascii, there are 16507 bytes in the first line. So somehow I need to make it so that I have either a file that has only one line (since using tr to delete '\n' seems to be causing issues) or I need a file that has 422 bytes on each line, so that the first two bytes of each line correspond to either 01,02,03,...,12,13.

hanshot1stx

View Public Profile for hanshot1stx

Find all posts by hanshot1stx

12-24-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by hanshot1stx

Alright Don, so I have spent the last few days playing with this now and have run into a couple quirks. First off are some things about the file. We will call the original EBCDIC file with all of the data data.ebc. I go ahead and do the simple conversion using dd to get a new file, data.ascii. Running the wc command gives me

0 lines in data.ebc, with 64454170 bytes
5948 lines in data.ascii with 64454170 bytes

OK. This is good! You translated EBCDIC bytes to the corresponding ASCII bytes and no bytes were added or lost. But, even though this is an ASCII file, it is not a text file; the <newline> characters are just binary data in your file; not line terminators.

Quote:

Then I use the tr command to get rid of newlines so that I have one line in my new.ascii file. Then I go through new.ascii and cut the first two bytes, get 01, and write that to a file, increment and repeat. This works perfectly until I get to bytes 16880, in which the program then gets thrown off. Interestingly in data.ascii, there are 16507 bytes in the first line. So somehow I need to make it so that I have either a file that has only one line (since using tr to delete '\n' seems to be causing issues) or I need a file that has 422 bytes on each line, so that the first two bytes of each line correspond to either 01,02,03,...,12,13.

Ouch. No! Don't remove ANY bytes from data.ascii. Those <newline> characters you're seeing in that file are probably the ASCII byte values corresponding to some of the binary packed decimal data bytes in your input.

The data in data.ascii is just a stream of bytes containing the records in your data; there are no record separators in data.ebc nor in data.ascii. In addition to <newline> characters, there are probably also <nul> (all bits 0) bytes that should not appear in a text file. But, we aren't going to treat data.ascii (or data.ebc) as a text file.

Can you show us a table where the 1st column gives us the 1st two characters of your records (the two bytes that specify the record type), the 2nd column gives us the length in bytes of records of that type (either with or without the two bytes specifying the record type, but tell us whether or not the record size given includes those bytes), and the 3rd column gives us the name of the file that to which records of this type should be appended? (Are these output files supposed to be ASCII or EBCDIC? On first read of your requirements, I thought wanted to feed ASCII data to your C++ converter and then take the output from your C++ converter and translate that back to EBCDIC. Reading your first post again, it isn't clear to me whether the C++ converter wants EBCDIC input or ASCII input.)

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

EBCDIC File Split Based On Record Key

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Discussion started by: Ravi.K

2. UNIX for Advanced & Expert Users

Removing Header and Trailer record of a EBCDIC file

Discussion started by: abhilashnair

3. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Discussion started by: ibmtech

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Discussion started by: lathigara

5. Shell Programming and Scripting

Split file when the key field change !

Discussion started by: csierra

6. Shell Programming and Scripting

split record based on delimiter

Discussion started by: Jairaj

7. Shell Programming and Scripting

Split a record based on particular match

Discussion started by: mksuneel

8. Shell Programming and Scripting

Split long record into csv file

Discussion started by: wvdeijk

9. Shell Programming and Scripting

How to split a file record

Discussion started by: aoussenko

10. UNIX for Dummies Questions & Answers

How to count the record count in an EBCDIC file.

Discussion started by: oracle8