Split long record into csv file

04-27-2009

Registered User

26, 0

Join Date: Feb 2005

Last Activity: 24 June 2015, 2:56 PM EDT

Location: Ontario

Posts: 26

Thanks Given: 0

Thanked 0 Times in 0 Posts

Split long record into csv file

Hi

I receive a mainframe file which has very long records (1100 chars) with no field delimiters. I need to parse each record and output a comma delimited (csv) file. The record layout is fixed. If there weren't so many fields and records I would read the file into Excel, as a "fixed width" file and manually split the record into it's separate components, but that is too time consuming and anyway, there are way too many records.

I was thinking of doing something in awk, like
read line
a=substr(line,1,5)
b=substr(line,6,2)
etc for each of the 226 fields
write a,b,c.......

but I'm sure there is a better way.

Any help will be much appreciated.

wvdeijk

View Public Profile for wvdeijk

Find all posts by wvdeijk

04-27-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

do you know the width of every field?
Can you come up with the list?

Last edited by vgersh99; 04-27-2009 at 02:44 PM..

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

04-27-2009

Registered User

2,524, 241

Join Date: Dec 2007

Last Activity: 17 March 2020, 2:04 PM EDT

Posts: 2,524

Thanks Given: 173

Thanked 241 Times in 206 Posts

Yes, the following does as a function, but can be done manually

Because I often have to read thru a fixed record file, I created the following function. It shows a basic way of defining fields and making a csv file. After defining the function, I often do something like:

grep "^abc" myfile | eoc2csv >20090427file.csv

Code:

FSvar=","      ### set FS variable for field separator
eoc2csv ()
{
   awk -v FSvar=$FSvar '
     {FS=IFS=OFS=FSvar}
     {
     SI=substr($0,1,13)
     NA=substr($0,14,30)
     AD=substr($0,44,92)
     GR=substr($0,136,9)
     CC=substr($0,164,6)
     UA=substr($0,170,4)
     DT=substr($0,180,3)
     ED=substr($0,183,8)
     DS=substr($0,251,8)
     JN=substr($0,271,10)
     SQ=substr($0,293,8)
     ET=substr($0,397,1)
     GT=substr($0,401,2)
     SD=substr($0,451,10)

     print SI,NA,AD,GR,CC,UA,DT,ED,DS,JN,SQ,ET,GT,SD
   }'
   return
}

Also seen where someone
cut -c1-5 myfile >file001
cut -c6-15 myfile >file002
and so on
paste -d"," file001 file002 ... >newfile

Not really any easier way I can think of - one way or another you will have to define all those fields.

joeyg

View Public Profile for joeyg

Find all posts by joeyg

04-27-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

my favorit is from comp.lang.awk - that's if you don't have 'gawk'. If you do have 'gawk' installed, it already has the 'FIELDWIDTHS' capability built in.
You enhance that by passing the 'FIELDWIDTHS' on cli:

Code:

function setFieldsByWidth(   i,n,FWS,start,copyd0) {
  # Licensed under GPL Peter S Tillier, 2003
  # NB corrupts $0
  copyd0 = $0                             # make copy of $0 to work on
  if (length(FIELDWIDTHS) == 0) {
    print "You need to set the width of the fields that you require" > "/dev/stderr"
    print "in the variable FIELDWIDTHS (NB: Upper case!)" > "/dev/stderr"
    exit(1)
  }

  if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
    print "The variable FIELDWIDTHS must contain digits, separated" > "/dev/stderr"
    print "by spaces." > "/dev/stderr"
    exit(1)
  }

  n = split(FIELDWIDTHS,FWS)

  if (n == 1) {
    print "Warning: FIELDWIDTHS contains only one field width." > "/dev/stderr"
    print "Attempting to continue." > "/dev/stderr"
  }

  start = 1
  for (i=1; i <= n; i++) {
    $i = substr(copyd0,start,FWS[i])
    start = start + FWS[i]
  }
}

#Note that the "/dev/stderr" entries in some lines have wrapped.

#I then call setFieldsByWidth() in my main awk code as follows:

BEGIN {
  #FIELDWIDTHS="7 6 5 4 3 2 1" # for example
  FIELDWIDTHS="1 3 8 8 5 9 1 9" # for example
  OFS="|"
}
!/^[  ]*$/ {
  saveDollarZero = $0 # if you want it later
  setFieldsByWidth()
  # now we can manipulate $0, NF and $1 .. $NF as we wish
  print $0 OFS
  next
}

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

04-27-2009

Registered User

26, 0

Join Date: Feb 2005

Last Activity: 24 June 2015, 2:56 PM EDT

Location: Ontario

Posts: 26

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks. I do know all the field widths. I'll try vgersh99's method and seee how it goes. Unfortunately, we don't have gawk installed, but that just means I'll have to create the function as well.

wvdeijk

View Public Profile for wvdeijk

Find all posts by wvdeijk

05-07-2009

Registered User

26, 0

Join Date: Feb 2005

Last Activity: 24 June 2015, 2:56 PM EDT

Location: Ontario

Posts: 26

Thanks Given: 0

Thanked 0 Times in 0 Posts

Finally managed to implement the setFieldsByWidth solution, and got it working once I realised that the function is an awk function, not a shell function (which would reside in my $FPATH)

. I do however need a little more help if possible - I have a lot of fields (200+) and the awk script is now erroring out because the FIELDWIDTHS variable is (much) longer than 399 characters. Is there an easy way round this.

A second problem is that I have 2 different record type in the file, which would require 2 different FIELDWIDTHS variables. Is it possible to do this, or would it be better to split the input file into 2 separate files before parsing?

Thanks for your help.

wvdeijk

View Public Profile for wvdeijk

Find all posts by wvdeijk

05-07-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Quote:

Originally Posted by wvdeijk

Finally managed to implement the setFieldsByWidth solution, and got it working once I realised that the function is an awk function, not a shell function (which would reside in my $FPATH) Smilie

If you're on Solaris, try either /usr/bin/nawk or /usr/xpg4/bin/awk - you might get higher limits.

Quote:

Originally Posted by wvdeijk

A second problem is that I have 2 different record type in the file, which would require 2 different FIELDWIDTHS variables. Is it possible to do this, or would it be better to split the input file into 2 separate files before parsing?
Thanks for your help.

sure. assuming you can programmatically determine the 'record type'...
set up 2 FIELDWIDTH variable: FIELDWIDTH1 and FIELDWIDTH2 with corresponding values. Implement the code to determine the 'record type'. Then

Code:

if (recordType == recordType1) ? setFieldsByWidth(FIELDWIDTH1) : setFieldsByWidth(FIELDWIDTH2)

Your 'setFieldsByWidth' function declaration would change to:

Code:

function setFieldsByWidth(FIELDWIDTH,       i,n,FWS,start,copyd0)

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

Shell Programming and Scripting

Split long record into csv file

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Discussion started by: Ravi.K

2. Shell Programming and Scripting

EBCDIC File Split Based On Record Key

Discussion started by: hanshot1stx

3. Shell Programming and Scripting

Long file record

Discussion started by: tricampeon81

4. Shell Programming and Scripting

Output first unique record in csv file

Discussion started by: Chris LAU

5. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Discussion started by: ibmtech

6. Shell Programming and Scripting

csv file - adding total to a trailer record

Discussion started by: mcclunyboy

7. Shell Programming and Scripting

Record count of a csv file

Discussion started by: ajaykk

8. Shell Programming and Scripting

How to split a file record

Discussion started by: aoussenko

9. UNIX for Dummies Questions & Answers

how to get a file name & record count of csv file

Discussion started by: sirik

10. UNIX for Dummies Questions & Answers

How to delete a record from a csv file

Discussion started by: Rajeev Agrawal