12-24-2015
Quote:
Originally Posted by
hanshot1stx
Alright Don, so I have spent the last few days playing with this now and have run into a couple quirks. First off are some things about the file. We will call the original EBCDIC file with all of the data data.ebc. I go ahead and do the simple conversion using dd to get a new file, data.ascii. Running the wc command gives me
0 lines in data.ebc, with 64454170 bytes
5948 lines in data.ascii with 64454170 bytes
OK. This is good! You translated EBCDIC bytes to the corresponding ASCII bytes and no bytes were added or lost. But, even though this is an ASCII file, it is not a text file; the <newline> characters are just binary data in your file; not line terminators.
Quote:
Then I use the tr command to get rid of newlines so that I have one line in my new.ascii file. Then I go through new.ascii and cut the first two bytes, get 01, and write that to a file, increment and repeat. This works perfectly until I get to bytes 16880, in which the program then gets thrown off. Interestingly in data.ascii, there are 16507 bytes in the first line. So somehow I need to make it so that I have either a file that has only one line (since using tr to delete '\n' seems to be causing issues) or I need a file that has 422 bytes on each line, so that the first two bytes of each line correspond to either 01,02,03,...,12,13.
Ouch. No! Don't remove ANY bytes from
data.ascii. Those <newline> characters you're seeing in that file are probably the ASCII byte values corresponding to some of the binary packed decimal data bytes in your input.
The data in
data.ascii is just a stream of bytes containing the records in your data; there are no record separators in
data.ebc nor in
data.ascii. In addition to <newline> characters, there are probably also <nul> (all bits 0) bytes that should not appear in a
text file. But, we aren't going to treat
data.ascii (or
data.ebc) as a text file.
Can you show us a table where the 1st column gives us the 1st two characters of your records (the two bytes that specify the record type), the 2nd column gives us the length in bytes of records of that type (either with or without the two bytes specifying the record type, but tell us whether or not the record size given includes those bytes), and the 3rd column gives us the name of the file that to which records of this type should be appended? (Are these output files supposed to be ASCII or EBCDIC? On first read of your requirements, I thought wanted to feed ASCII data to your C++ converter and then take the output from your C++ converter and translate that back to EBCDIC. Reading your first post again, it isn't clear to me whether the C++ converter wants EBCDIC input or ASCII input.)
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
How do I get the record count in an EBCDIC file on a Linux Box. :confused: (1 Reply)
Discussion started by: oracle8
1 Replies
2. Shell Programming and Scripting
-Hi, I have a problem with parcing/spliting a file record into two parts and assigning the split parts to two viriables. The record is as follows:
ftrn facc ttrd feed xref fsdb fcp ruldb csdb omom fordr ftxn fodb fsdc texc oxox reng ttrn ttxn fqdb ... (5 Replies)
Discussion started by: aoussenko
5 Replies
3. Shell Programming and Scripting
Hi
I receive a mainframe file which has very long records (1100 chars) with no field delimiters. I need to parse each record and output a comma delimited (csv) file. The record layout is fixed. If there weren't so many fields and records I would read the file into Excel, as a "fixed width"... (10 Replies)
Discussion started by: wvdeijk
10 Replies
4. Shell Programming and Scripting
Hi ,
I have a requirement to split the record based on particular match using UNIX.
Case1:
Input Record :
10.44.48.63;"Personals/Dating;sports";1441
Output Records :
10.44.48.63;Personals/Dating;1441;Original
10.44.48.63;sports;1441;Dummy
Case2:
Input Record : ... (5 Replies)
Discussion started by: mksuneel
5 Replies
5. Shell Programming and Scripting
Hi,
My inputfile contains field separaer is ^.
12^inms^
13^fakdks^ssk^s3
23^avsd^
13^fakdks^ssk^a4
I wanted to print only 2 delimiter occurence i.e
12^inms^
23^avsd^ (4 Replies)
Discussion started by: Jairaj
4 Replies
6. Shell Programming and Scripting
Hello,
I have the following example data file:
Rv.Global_Sk,1077.160523,D,16/09/2011
Rv.Global_Sk,1077.08098,D,17/09/2011
Rv.Global_Sk,1077.001445,D,18/09/2011
Rv.Global_Sk,1072.660733,D,19/09/2011
Rv.Global_Sk,1070.381557,D,20/09/2011
Rv.Global_Sk,1071.971747,D,21/09/2011... (4 Replies)
Discussion started by: csierra
4 Replies
7. Shell Programming and Scripting
Hi i want to fetch 100k record from a file which is looking like as below.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
... (17 Replies)
Discussion started by: lathigara
17 Replies
8. Shell Programming and Scripting
Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies
9. UNIX for Advanced & Expert Users
I have a EBCDIC multi layout file which has a header record which is 21 bytes, The Detail records are 2427 bytes long and the trailer record is 9 bytes long.
Is there a command to remove the header as well as trailer record and read only the detail records while at the same time not altering... (1 Reply)
Discussion started by: abhilashnair
1 Replies
10. UNIX for Advanced & Expert Users
Hi,
I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues.
If the record delimiter is unix new line, I could use split command either with option l or b.
The problem is that the line terminator is |##|
How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies
SPLIT(1) BSD General Commands Manual SPLIT(1)
NAME
split -- split a file into pieces
SYNOPSIS
split [-a suffix_length] [-b byte_count[k|m] | -l line_count -n chunk_count] [file [name]]
DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each. If file is a single dash or absent, split reads from
the standard input. file itself is not altered.
The options are as follows:
-a Use suffix_length letters to form the suffix of the file name.
-b Create smaller files byte_count bytes in length. If 'k' is appended to the number, the file is split into byte_count kilobyte
pieces. If 'm' is appended to the number, the file is split into byte_count megabyte pieces.
-l Create smaller files line_count lines in length.
-n Split file into chunk_count smaller files.
If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument
is specified, it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is
split is named by the prefix followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not speci-
fied, two letters are used as the suffix.
If the name argument is not specified, 'x' is used.
STANDARDS
The split utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'').
HISTORY
A split command appeared in Version 6 AT&T UNIX.
The -a option was introduced in NetBSD 2.0. Before that, if name was not specified, split would vary the first letter of the filename to
increase the number of possible output files. The -a option makes this unnecessary.
BSD
May 28, 2007 BSD