EBCDIC File Split Based On Record Key


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting EBCDIC File Split Based On Record Key
# 1  
Old 12-21-2015
EBCDIC File Split Based On Record Key

I was wondering if anyone could explain to me how to split a variable length EBCDIC file into seperate files based on the record key. I have the COBOL layout, and so I need to split the file into 13 different EBCDIC files so that I can run each one through a C++ converter I have, and get the corresponding csv output file to put into a database. The records are:

Record Key Segment Name
01 GRROOT
02 GRCYCLE
. .
. .
. .
13 GRR3RMKS

If it helps at all, the PDF that comes with the EBCDIC file showing the COBOL layout states that the record length of the file is 422, the blocking factor is 77 and the blocksize is 32,494. There is additional information such as GRROOT length is 150 bytes, GRCYCLE is 72 bytes, etc.

Thanks for the help
# 2  
Old 12-22-2015
There is an awful lot that is unspecified here:
  • What operating system are you using?
  • Is there any binary data in the COBOL files you're processing, or is it all text.
  • Is the record key the entire 422 byte record? If not what part of the record constitutes the key?
  • Why do you say the input is variable length and then say that the record length is 422 bytes per record and 77 records per block? What is variable other than the number of records in the file?
If you're trying to process EBCDIC files on an ASCII based system, the dd utility will probably be at the base of your processing. Look at the dd man page on your system and see if something like the following would be a good start to getting a file you can then split awk or grep:
Code:
dd if=YourEBCDICInputFileName of=YourASCIIOutputFileName ibs=422x77 cbs=422 conv=ascii,unblock,sync

and, after splitting YourASCIIOutputFileName into the files you want based on your keys, you could convert them back into fixed-length, blocked, EBCDIC files using dd again with obs=422x77 instead of ibs=422x77 and conv=ebcdic,block... and appropriate if= and of= parameters.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 12-22-2015
Thanks for the reply Don. I am doing this for a little side work project. Here are some of the specifics:

1) Any OS. I have a machine that runs ubuntu, and my work computer is windows 10. It sounds like ubunutu would be my preference here.
2) There is binary data that is being processed, packed decimal fields if that sounds right.
3) Reading through the cobol the record key is the first two bytes (1,2) of each record
4) The 422 and 77 bytes were numbers that appear in the front of the PDF, but then later it says that each record is of variable length, and gives me the length of each record. The total number of records would change each month, since this is a monthly dataset.

As I am typing this, it sounds like I would need to use the dd command and be able to change the number of bytes that is read each time based on what the record key is. So lets say I use dd and I want to read the first two bytes. If the ASCII conversion of those bytes = 01, then I know that the record length is 150 bytes, so I want to read the 150 and write them to a new EBCDIC file, that will later be sent through a program that unpacks the fields and converts to a csv. Then I would want to skip 150 bytes and read the next two bytes. Lets say those = 02, so I know that the record is 72 bytes. So on and so forth
# 4  
Old 12-22-2015
You can convert the entire EBCDIC file to ASCII just using:
Code:
dd if=EBCDICfile of=ASCIIfile conv=ascii

but, with packed decimal fields in the COBOL input file, you may end up with null bytes in the output file. And, you can't have null bytes in a text file. If you aren't working on text files, many of the standard utilities produce undefined results.

But, you can use cut and paste even if the files being processed are not text files. So, after converting your file to ASCII, you could walk through the file in a loop, starting with offset=1:
  • grabbing two bytes (with cut) to determine record type,
  • based on the type, grab x more bytes (again with cut) to complete the record you started reading,
  • writing the complete x+2 byte record to the appropriate output file (again, based on record type), and
  • incrementing offset by x+2.
As long as you don't modify the parts of the record that contain packed decimal data, converting back from ASCII to EBCDIC should still have correct packed decimal data in the resulting EBCDIC file.
# 5  
Old 12-22-2015
why not install gnucobol and write a cobol program.
# 6  
Old 12-24-2015
Alright Don, so I have spent the last few days playing with this now and have run into a couple quirks. First off are some things about the file. We will call the original EBCDIC file with all of the data data.ebc. I go ahead and do the simple conversion using dd to get a new file, data.ascii. Running the wc command gives me

0 lines in data.ebc, with 64454170 bytes
5948 lines in data.ascii with 64454170 bytes

Then I use the tr command to get rid of newlines so that I have one line in my new.ascii file. Then I go through new.ascii and cut the first two bytes, get 01, and write that to a file, increment and repeat. This works perfectly until I get to bytes 16880, in which the program then gets thrown off. Interestingly in data.ascii, there are 16507 bytes in the first line. So somehow I need to make it so that I have either a file that has only one line (since using tr to delete '\n' seems to be causing issues) or I need a file that has 422 bytes on each line, so that the first two bytes of each line correspond to either 01,02,03,...,12,13.
# 7  
Old 12-24-2015
Quote:
Originally Posted by hanshot1stx
Alright Don, so I have spent the last few days playing with this now and have run into a couple quirks. First off are some things about the file. We will call the original EBCDIC file with all of the data data.ebc. I go ahead and do the simple conversion using dd to get a new file, data.ascii. Running the wc command gives me

0 lines in data.ebc, with 64454170 bytes
5948 lines in data.ascii with 64454170 bytes
OK. This is good! You translated EBCDIC bytes to the corresponding ASCII bytes and no bytes were added or lost. But, even though this is an ASCII file, it is not a text file; the <newline> characters are just binary data in your file; not line terminators.
Quote:
Then I use the tr command to get rid of newlines so that I have one line in my new.ascii file. Then I go through new.ascii and cut the first two bytes, get 01, and write that to a file, increment and repeat. This works perfectly until I get to bytes 16880, in which the program then gets thrown off. Interestingly in data.ascii, there are 16507 bytes in the first line. So somehow I need to make it so that I have either a file that has only one line (since using tr to delete '\n' seems to be causing issues) or I need a file that has 422 bytes on each line, so that the first two bytes of each line correspond to either 01,02,03,...,12,13.
Ouch. No! Don't remove ANY bytes from data.ascii. Those <newline> characters you're seeing in that file are probably the ASCII byte values corresponding to some of the binary packed decimal data bytes in your input.

The data in data.ascii is just a stream of bytes containing the records in your data; there are no record separators in data.ebc nor in data.ascii. In addition to <newline> characters, there are probably also <nul> (all bits 0) bytes that should not appear in a text file. But, we aren't going to treat data.ascii (or data.ebc) as a text file.

Can you show us a table where the 1st column gives us the 1st two characters of your records (the two bytes that specify the record type), the 2nd column gives us the length in bytes of records of that type (either with or without the two bytes specifying the record type, but tell us whether or not the record size given includes those bytes), and the 3rd column gives us the name of the file that to which records of this type should be appended? (Are these output files supposed to be ASCII or EBCDIC? On first read of your requirements, I thought wanted to feed ASCII data to your C++ converter and then take the output from your C++ converter and translate that back to EBCDIC. Reading your first post again, it isn't clear to me whether the C++ converter wants EBCDIC input or ASCII input.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

2. UNIX for Advanced & Expert Users

Removing Header and Trailer record of a EBCDIC file

I have a EBCDIC multi layout file which has a header record which is 21 bytes, The Detail records are 2427 bytes long and the trailer record is 9 bytes long. Is there a command to remove the header as well as trailer record and read only the detail records while at the same time not altering... (1 Reply)
Discussion started by: abhilashnair
1 Replies

3. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

4. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

5. Shell Programming and Scripting

Split file when the key field change !

Hello, I have the following example data file: Rv.Global_Sk,1077.160523,D,16/09/2011 Rv.Global_Sk,1077.08098,D,17/09/2011 Rv.Global_Sk,1077.001445,D,18/09/2011 Rv.Global_Sk,1072.660733,D,19/09/2011 Rv.Global_Sk,1070.381557,D,20/09/2011 Rv.Global_Sk,1071.971747,D,21/09/2011... (4 Replies)
Discussion started by: csierra
4 Replies

6. Shell Programming and Scripting

split record based on delimiter

Hi, My inputfile contains field separaer is ^. 12^inms^ 13^fakdks^ssk^s3 23^avsd^ 13^fakdks^ssk^a4 I wanted to print only 2 delimiter occurence i.e 12^inms^ 23^avsd^ (4 Replies)
Discussion started by: Jairaj
4 Replies

7. Shell Programming and Scripting

Split a record based on particular match

Hi , I have a requirement to split the record based on particular match using UNIX. Case1: Input Record : 10.44.48.63;"Personals/Dating;sports";1441 Output Records : 10.44.48.63;Personals/Dating;1441;Original 10.44.48.63;sports;1441;Dummy Case2: Input Record : ... (5 Replies)
Discussion started by: mksuneel
5 Replies

8. Shell Programming and Scripting

Split long record into csv file

Hi I receive a mainframe file which has very long records (1100 chars) with no field delimiters. I need to parse each record and output a comma delimited (csv) file. The record layout is fixed. If there weren't so many fields and records I would read the file into Excel, as a "fixed width"... (10 Replies)
Discussion started by: wvdeijk
10 Replies

9. Shell Programming and Scripting

How to split a file record

-Hi, I have a problem with parcing/spliting a file record into two parts and assigning the split parts to two viriables. The record is as follows: ftrn facc ttrd feed xref fsdb fcp ruldb csdb omom fordr ftxn fodb fsdc texc oxox reng ttrn ttxn fqdb ... (5 Replies)
Discussion started by: aoussenko
5 Replies

10. UNIX for Dummies Questions & Answers

How to count the record count in an EBCDIC file.

How do I get the record count in an EBCDIC file on a Linux Box. :confused: (1 Reply)
Discussion started by: oracle8
1 Replies
Login or Register to Ask a Question