Split long record into csv file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split long record into csv file
# 1  
Old 04-27-2009
Split long record into csv file

Hi

I receive a mainframe file which has very long records (1100 chars) with no field delimiters. I need to parse each record and output a comma delimited (csv) file. The record layout is fixed. If there weren't so many fields and records I would read the file into Excel, as a "fixed width" file and manually split the record into it's separate components, but that is too time consuming and anyway, there are way too many records.

I was thinking of doing something in awk, like
read line
a=substr(line,1,5)
b=substr(line,6,2)
etc for each of the 226 fields
write a,b,c.......

but I'm sure there is a better way.

Any help will be much appreciated.
# 2  
Old 04-27-2009
do you know the width of every field?
Can you come up with the list?

Last edited by vgersh99; 04-27-2009 at 02:44 PM..
# 3  
Old 04-27-2009
Hammer & Screwdriver Yes, the following does as a function, but can be done manually

Because I often have to read thru a fixed record file, I created the following function. It shows a basic way of defining fields and making a csv file. After defining the function, I often do something like:

grep "^abc" myfile | eoc2csv >20090427file.csv

Code:
FSvar=","      ### set FS variable for field separator
eoc2csv ()
{
   awk -v FSvar=$FSvar '
     {FS=IFS=OFS=FSvar}
     {
     SI=substr($0,1,13)
     NA=substr($0,14,30)
     AD=substr($0,44,92)
     GR=substr($0,136,9)
     CC=substr($0,164,6)
     UA=substr($0,170,4)
     DT=substr($0,180,3)
     ED=substr($0,183,8)
     DS=substr($0,251,8)
     JN=substr($0,271,10)
     SQ=substr($0,293,8)
     ET=substr($0,397,1)
     GT=substr($0,401,2)
     SD=substr($0,451,10)

     print SI,NA,AD,GR,CC,UA,DT,ED,DS,JN,SQ,ET,GT,SD
   }'
   return
}

Also seen where someone
cut -c1-5 myfile >file001
cut -c6-15 myfile >file002
and so on
paste -d"," file001 file002 ... >newfile

Not really any easier way I can think of - one way or another you will have to define all those fields.
# 4  
Old 04-27-2009
my favorit is from comp.lang.awk - that's if you don't have 'gawk'. If you do have 'gawk' installed, it already has the 'FIELDWIDTHS' capability built in.
You enhance that by passing the 'FIELDWIDTHS' on cli:
Code:
function setFieldsByWidth(   i,n,FWS,start,copyd0) {
  # Licensed under GPL Peter S Tillier, 2003
  # NB corrupts $0
  copyd0 = $0                             # make copy of $0 to work on
  if (length(FIELDWIDTHS) == 0) {
    print "You need to set the width of the fields that you require" > "/dev/stderr"
    print "in the variable FIELDWIDTHS (NB: Upper case!)" > "/dev/stderr"
    exit(1)
  }

  if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
    print "The variable FIELDWIDTHS must contain digits, separated" > "/dev/stderr"
    print "by spaces." > "/dev/stderr"
    exit(1)
  }

  n = split(FIELDWIDTHS,FWS)

  if (n == 1) {
    print "Warning: FIELDWIDTHS contains only one field width." > "/dev/stderr"
    print "Attempting to continue." > "/dev/stderr"
  }

  start = 1
  for (i=1; i <= n; i++) {
    $i = substr(copyd0,start,FWS[i])
    start = start + FWS[i]
  }
}

#Note that the "/dev/stderr" entries in some lines have wrapped.

#I then call setFieldsByWidth() in my main awk code as follows:

BEGIN {
  #FIELDWIDTHS="7 6 5 4 3 2 1" # for example
  FIELDWIDTHS="1 3 8 8 5 9 1 9" # for example
  OFS="|"
}
!/^[  ]*$/ {
  saveDollarZero = $0 # if you want it later
  setFieldsByWidth()
  # now we can manipulate $0, NF and $1 .. $NF as we wish
  print $0 OFS
  next
}

# 5  
Old 04-27-2009
Thanks. I do know all the field widths. I'll try vgersh99's method and seee how it goes. Unfortunately, we don't have gawk installed, but that just means I'll have to create the function as well.
# 6  
Old 05-07-2009
Finally managed to implement the setFieldsByWidth solution, and got it working once I realised that the function is an awk function, not a shell function (which would reside in my $FPATH)Smilie. I do however need a little more help if possible - I have a lot of fields (200+) and the awk script is now erroring out because the FIELDWIDTHS variable is (much) longer than 399 characters. Is there an easy way round this.

A second problem is that I have 2 different record type in the file, which would require 2 different FIELDWIDTHS variables. Is it possible to do this, or would it be better to split the input file into 2 separate files before parsing?

Thanks for your help.
# 7  
Old 05-07-2009
Quote:
Originally Posted by wvdeijk
Finally managed to implement the setFieldsByWidth solution, and got it working once I realised that the function is an awk function, not a shell function (which would reside in my $FPATH)Smilie. I do however need a little more help if possible - I have a lot of fields (200+) and the awk script is now erroring out because the FIELDWIDTHS variable is (much) longer than 399 characters. Is there an easy way round this.
If you're on Solaris, try either /usr/bin/nawk or /usr/xpg4/bin/awk - you might get higher limits.
Quote:
Originally Posted by wvdeijk
A second problem is that I have 2 different record type in the file, which would require 2 different FIELDWIDTHS variables. Is it possible to do this, or would it be better to split the input file into 2 separate files before parsing?
Thanks for your help.
sure. assuming you can programmatically determine the 'record type'...
set up 2 FIELDWIDTH variable: FIELDWIDTH1 and FIELDWIDTH2 with corresponding values. Implement the code to determine the 'record type'. Then
Code:
if (recordType == recordType1) ? setFieldsByWidth(FIELDWIDTH1) : setFieldsByWidth(FIELDWIDTH2)

Your 'setFieldsByWidth' function declaration would change to:
Code:
function setFieldsByWidth(FIELDWIDTH,       i,n,FWS,start,copyd0)

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

2. Shell Programming and Scripting

EBCDIC File Split Based On Record Key

I was wondering if anyone could explain to me how to split a variable length EBCDIC file into seperate files based on the record key. I have the COBOL layout, and so I need to split the file into 13 different EBCDIC files so that I can run each one through a C++ converter I have, and get the... (11 Replies)
Discussion started by: hanshot1stx
11 Replies

3. Shell Programming and Scripting

Long file record

riends I have the following problem: test.txt I have a file that has the following contents: is a fixed-length file to the end of the number 12 has spaces, so that it is fixed length 123456789 123456789 123456789 12 This code shows me the length of each record, but in... (2 Replies)
Discussion started by: tricampeon81
2 Replies

4. Shell Programming and Scripting

Output first unique record in csv file

Hi, I have to output a new csv file from an input csv file with first unique value in the first column. input csv file color product id status green 102 pass yellow 201 hold yellow 202 keep green 101 ok green 103 hold yellow 203 ... (5 Replies)
Discussion started by: Chris LAU
5 Replies

5. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

6. Shell Programming and Scripting

csv file - adding total to a trailer record

Hi, I have a script which creates and modifies a csv file. I have managed to do everything I need to do apart from 1 thing. I need to append a trailer record to the file. I need this line to hold the total of an entire column of the csv file (skipping the 1st line which is a header). Can... (2 Replies)
Discussion started by: mcclunyboy
2 Replies

7. Shell Programming and Scripting

Record count of a csv file

Hello Gurus, We have a requirement to count the valid number of records in a comma delimited file with double quotes. The catch here is..few records have a new line carriage within the double quotes,,say for ex:we have a file called accounts the record count is 4827..but the actual valid count... (5 Replies)
Discussion started by: ajaykk
5 Replies

8. Shell Programming and Scripting

How to split a file record

-Hi, I have a problem with parcing/spliting a file record into two parts and assigning the split parts to two viriables. The record is as follows: ftrn facc ttrd feed xref fsdb fcp ruldb csdb omom fordr ftxn fodb fsdc texc oxox reng ttrn ttxn fqdb ... (5 Replies)
Discussion started by: aoussenko
5 Replies

9. UNIX for Dummies Questions & Answers

how to get a file name & record count of csv file

HI , I am new to shell scripting , I have a requirement that I send a file for data quality ( original.csv) & i will be getting 4 files daily into a particular directory in return with cleansed data . the files may be clean.csv, unclean.csv , ... (2 Replies)
Discussion started by: sirik
2 Replies

10. UNIX for Dummies Questions & Answers

How to delete a record from a csv file

Hi Guys I have downloaded a table from oracle database in .csv format. it has many fields as Title, First Name, Last Name etc. I have to download distinct titles from database and now i have to check all those titles from data of First Name one by one. and then i have to delete matched record.... (1 Reply)
Discussion started by: Rajeev Agrawal
1 Replies
Login or Register to Ask a Question