Split variable length and variable format CSV file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split variable length and variable format CSV file
# 1  
Old 02-05-2010
Data Split variable length and variable format CSV file

Dear all,

I have basic knowledge of Unix script and her I am trying to process variable length and variable format CSV file.

The file length will depend on the numbers of Earnings/Deductions/Direct Deposits.
And
The format will depend on whether it is Earnings/Deductions or Direct Deposits

For example, if employee has 2 earning, 2 deductions, and 2 direct deposits accounts, The record after EmployeeID will repeat twice for Earnings, repeat twice for Deductions, and repeat twice for direct deposits.
For earnings and deposits, the format are the same, they need to have earnings/deposit numbers then earnings/deposit amounts.
For direct deposits, the format is more complicated, it will require Bank Number, Account Type, Account Number, Amount,

Here is the sample data. The first line is header.
Code:
EmployeeID,Total numbers of Earnings (E), Total numbers of Deductions (D), Total numbers of Direct Deposit (DD), E/D #1 Number, E/D #1 Amount, E/D #2 Number, E/D #2 Amount,....,E/D #n Number, E/D #n Amount, DD #1 Bank Number, DD #1 Account Type, DD #1 Account Number, DD #1 Amount, DD #2 Bank Number, DD #2 Account Type, DD #2 Account Number, DD #2 Amount,......,DD #n Bank Number, DD #n Account Type, DD #n Account Number, DD #n Amount, 
1,1,2,1,01,12000,13,1000,14,1000,010001001,ABC,123456,10000
2,2,2,2,01,10000,02,2000,11,500,12, 500,010001002,ABC,1212121,5000,010001003,CDE,123123,6000

For Employee #1, he/she has 1 earning, 2 deductions, and 1 direct deposit, so his/her record will have one earning (01), two deductions (13 and 14), and one direct deposit account (010001001,ABC,123456)
For Employee #2, he/she has 2 earnings, 2 deduction,s and 2 direct deposits, so his/her record will have one earning (01 and 02), two deductions (11 and 12), and two direct deposit accounts (010001002,ABC,1212121 and 010001003,CDE,123123)

Since the numbers of Earnings/Deductions or Direct Deposits could vary, the length and format of the records in CSV file are different.

I have several options.

1. I can seperate the file into three files. One for Earning, one for deduction, and one for direct deposit. Each file will have employee id in the beginning of the record.
so for sample data will look like

Earning File
Code:
EmployeeID,Earning #1 Number, Earning  #1 Amount,....,Earning #n Number, Earning  #n Amount
1,01,12000
2,01,10000,02,2000

Deduction File
Code:
EmployeeID,Deduction #1 Number, Deduction  #1 Amount,....,Deduction #n Number, Deduction  #n Amount
1, 13,1000,14,1000
2,11,500,12, 500

Direct Deposit File
Code:
EmployeeID,#1 Bank Number, #1 Account Type, #1 Account Number, #1 Amount,..... #n Bank Number, #n Account Type, #n Account Number, #n Amount
1,010001001,ABC,123456,10000
2,010001002,ABC,1212121,5000,010001003,CDE,123123,6000

or
2. I can further "normalize" each record into sepeate records in each files.

Earning File
Code:
EmployeeID,Earning Number, Earning Amount
1,01,12000
2,01,10000
2,02,2000

Deduction File
Code:
EmployeeID,Deduction Number, Deduction  Amount
1,13,1000
1,14,1000
2,11,500
2,12,500

Direct Deposit File
Code:
EmployeeID,Bank Number, Account Type, Account Number, Amount
1,010001001,ABC,123456,10000
2,010001002,ABC,1212121,5000,010001003,CDE,123123,6000

Thank you very much for your time to help me to resolve this issue.

chechun

Last edited by Franklin52; 02-05-2010 at 01:08 PM.. Reason: Please use code tags!
# 2  
Old 02-18-2010
Resolved this using PL/SQL

Unfortunately, I couldn't find a good solution in Unix shell script. I have resolved this issue using PL/SQL. I used several for loop and combination of substr and instr to resolve it. Thanks!
# 3  
Old 02-19-2010
Hm,
if it's not too late, you can use something like this:

earnings:

Code:
perl -F, -lane'
next if $. == 1; print join ",", $F[0], @F[4 .. (3 + $F[1]) * 2];
' infile.csv

deductions:

Code:
perl -F, -lane'
next if $. == 1; print join ",", $F[0], @F[(3 + $F[1] * 2) + 1 .. (3 + $F[1] * 2) + $F[2] * 2];
' infile.csv

etc.

For example, given your sample file:

Code:
% perl -F, -lane'
next if $. == 1; print join ",", $F[0], @F[4 .. (3 + $F[1]) * 2];
' infile.csv
1,01,12000,13,1000,14
2,01,10000,02,2000,11,500,12

Code:
% perl -F, -lane'
next if $. == 1; print join ",", $F[0], @F[(3 + $F[1] * 2) + 1 .. (3 + $F[1] * 2) + $F[2] * 2];
' infile.csv
1,13,1000,14,1000
2,11,500,12, 500

Let me know if you still need it, I can help you to incorporate the code in a single program.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[Solved] How to increment and add variable length numbers to a variable in a loop?

Hi All, I have a file which has hundred of records with fixed number of fields. In each record there is set of 8 characters which represent the duration of that activity. I want to sum up the duration present in all the records for a report. The problem is the duration changes per record so I... (5 Replies)
Discussion started by: danish0909
5 Replies

2. Shell Programming and Scripting

Adding a delimiter to a variable length file

Hi, I'm new to unix, i have a variable length file like below, 01|Test|Test1|Sample| 02|AA|BB|CC|DD| 03|AAA|BBB|CCC|DDD|EEE|RRR|TTT|SSS|YYY| I need to make this as a fixed length file. Assume that i have 10 columns in the DAT file. for ex: the first 01 record is having 4cols -... (8 Replies)
Discussion started by: Mohankumar Venu
8 Replies

3. Shell Programming and Scripting

Parse variable length status,timestamp CSV

I have the output of a process which on status change of the object being processed appends status,timestamp to the record in the text file... so i end up with output something like: task123,TERMINAL,glob,5,INITIAL,2012-02-27 16:48:07,PREPARING,2012-02-27 16:49:06,SCHEDULED,2012-02-27... (2 Replies)
Discussion started by: KarmaPoliceT2
2 Replies

4. Shell Programming and Scripting

changing a variable length text to a fixed length

Hi, Can anyone help with a effective solution ? I need to change a variable length text field (between 1 - 18 characters) to a fixed length text of 18 characters with the unused portion, at the end, filled with spaces. The text field is actually field 10 of a .csv file however I could cut... (7 Replies)
Discussion started by: dc18
7 Replies

5. Shell Programming and Scripting

Make variable length record a fixed length

Very, very new to unix scripting and have a unique situation. I have a file of records that contain 3 records types: (H)eader Records (D)etail Records (T)railer Records The Detail records are 82 bytes in length which is perfect. The Header and Trailer records sometimes are 82 bytes in... (3 Replies)
Discussion started by: jclanc8
3 Replies

6. Shell Programming and Scripting

Parsing a variable length file

Hi I am new to shell scripting. I need to parse a file which contains the header and detail records and split into n of file based on dept ID, for ex. INPUT FILE: DEPT ID: 1 EMPNAME: XYZ EMPAddress: XYZZZ DEPT ID: 2 EMPNAME: ABC EMPAddress: ABCD DEPT ID: 1 EMPNAME: PQR EMPAddress:... (6 Replies)
Discussion started by: singhald
6 Replies

7. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected... (2 Replies)
Discussion started by: Everton_Silveir
2 Replies

8. UNIX for Dummies Questions & Answers

Creating a file in variable length format

Hi all, Does anyone know a technique for creating a download file in variable length format? I have looked around but haven't found any resources on this (or, maybe I'm not sure what to Google for :) ) Thanks in advance! (2 Replies)
Discussion started by: Sarahb29
2 Replies

9. Shell Programming and Scripting

use awk to read variable length csv

Any help to read the contents of a variable length csv ....??(using awk) The csv mite look like this : anjali,ram,rahul,mohini,sam,.... and so on ... I need to pick up each name.. Thanks in advance SD (3 Replies)
Discussion started by: shweta_d
3 Replies

10. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks (2 Replies)
Discussion started by: r1500
2 Replies
Login or Register to Ask a Question