ksh script to create a generic csv file from different source formats


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting ksh script to create a generic csv file from different source formats
# 1  
Old 09-17-2010
ksh script to create a generic csv file from different source formats

Hi all,

I have a requirement to create a "superset" file out of a number of different sources with some different and some same columns.

We intend to have a manually updateable SuperSetCols.csv which would look like
Code:
"ColA","ColB","ColC","ColD","ColE","ColF","ColG"

so someday we may add ColH ColI etc We will only ever add to the end of the list as new columns are required.

Our sources look like this:
Code:
File0.csv
"ColA","ColB","ColC"
"Dat001","Dat002","Dat003"
"Dat011","Dat012","Dat013"
 
File1.csv
"ColA","ColB","ColD"
"Dat101","Dat102","Dat104"
"Dat111","Dat112","Dat114"
 
File2.csv
"ColD","ColE","ColF","ColG"
"Dat204","Dat205","Dat206","Dat207"
"Dat214","Dat215","Dat216","Dat217"

Our intention is to basically fill the gaps where necessary. So File0 needs columns D-G, File1 C & E-G and File2 A-C. The column names are expected to match identically and again remain in order as per the superset.

We would like to output the files to a 'Super' directory as follows:
Code:
Super/File0.csv
"ColA","ColB","ColC","ColD","ColE","ColF","ColG"
"Dat001","Dat002","Dat003","","","",""
"Dat011","Dat012","Dat013","","","",""
 
Super/File1.csv
"ColA","ColB","ColC","ColD","ColE","ColF","ColG"
"Dat101","Dat102","","Dat104","","",""
"Dat111","Dat112","","Dat114","","",""
 
Super/File2.csv
"ColA","ColB","ColC","ColD","ColE","ColF","ColG"
"","","","Dat204","Dat205","Dat206","Dat207"
"","","","Dat214","Dat215","Dat216","Dat217"

So that a different process will be able to concatenate them when it is ready and read them as one common file structure:

Code:
"ColA","ColB","ColC","ColD","ColE","ColF","ColG"
"Dat001","Dat002","Dat003","","","",""
"Dat011","Dat012","Dat013","","","",""
"Dat101","Dat102","","Dat104","","",""
"Dat111","Dat112","","Dat114","","",""
"","","","Dat204","Dat205","Dat206","Dat207"
"","","","Dat214","Dat215","Dat216","Dat217"

Many thanks in advance to all who reply, I would greatly appreciate anyone's help on this!

Lee
# 2  
Old 09-17-2010
First, you'll have to make a list of column names. It can be done like
Code:
head -n1 file*.csv | tr ',' '\n' | sort -u

For the following, please wait...
But i think perl will do the best for your needs, while it handles map arrays.

Last edited by frans; 09-17-2010 at 01:15 PM..
# 3  
Old 09-18-2010
Generate title file:
Code:
awk 'FNR==1' File*.csv |tr "," "\n" |sort -u > title.txt

"ColA"
"ColB"
"ColC"
"ColD"
"ColE"
"ColF"
"ColG"

create the subfolder Super

Code:
mkdir Super

Get the result (but each file has comma in the end of each line).

Code:
awk -F , '
NR==FNR {title[FNR]=$1;b[$1]=FNR;n=FNR;next} 
FNR==1 {   
           for (i=1;i<=n;i++) 
              { 
                  printf title[i] FS > "Super/" FILENAME
                  sign[i]=0
               }
           printf "\n" > "Super/" FILENAME
           for (i=1;i<=NF;i++)
              sign[b[$i]]=1
           next
       }
{    
     split($0,x,",")
     for (i=1;i<=n;i++)
         { 
           printf (sign[i]=="1")? x[++j] FS:"\"\"" FS > "Super/" FILENAME
         }
     printf "\n" > "Super/" FILENAME
     j=0
}
' title.txt *.csv

$ cat Super/File2.csv
"ColA","ColB","ColC","ColD","ColE","ColF","ColG",
"","","","Dat204","Dat205","Dat206","Dat207",
"","","","Dat214","Dat215","Dat216","Dat217",

If you need clean the end comma from each line

Code:
for file in Super/File*.csv
do
  sed 's/,$//' $file > tmp
  mv tmp $file
done

if your sed support -i option:
Code:
for file in Super/File*.csv
do
  sed -i 's/,$//' $file > tmp
done



---------- Post updated at 12:28 AM ---------- Previous update was at 12:22 AM ----------

Quote:
Originally Posted by Leedor

So that a different process will be able to concatenate them when it is ready and read them as one common file structure:

Code:
"ColA","ColB","ColC","ColD","ColE","ColF","ColG"
"Dat001","Dat002","Dat003","","","",""
"Dat011","Dat012","Dat013","","","",""
"Dat101","Dat102","","Dat104","","",""
"Dat111","Dat112","","Dat114","","",""
"","","","Dat204","Dat205","Dat206","Dat207"
"","","","Dat214","Dat215","Dat216","Dat217"

Code:
awk 'NR==1||FNR>1' Super/*.csv

This User Gave Thanks to rdcwayx For This Post:
# 4  
Old 09-20-2010
Thankyou!

Many thanks rdcwayx.

Have tested your code quite thoroughly this morning and it seems to work perfectly for our needs.

Much appreciated!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk and sed script to create one output CSV file

Hi All , I would require your help to generate one output file after post processing of one CSV file as stated below This file is just a small cut from a big file . Big file is having 20000 lines PATTERN,pat0,pat1,pat2,pat3,pat4,pat5,pat6,pat7,pat8,pat9... (2 Replies)
Discussion started by: kshitij
2 Replies

2. Shell Programming and Scripting

I am trying to merge all csv files from source path into 1 file

I am trying to merge all csv files from source path into one single csv file in target. but getting error message: hadoop fs -cat /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/TCH-charge_*.csv > /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/final/TCH_pb_charge.csv getting error message:... (0 Replies)
Discussion started by: cplusplus1
0 Replies

3. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

4. UNIX for Dummies Questions & Answers

How to Create excel file(.csv) using shell script?

Hi, i have shell script which compiles n number of test cases and execute them one by one. i want to create report in excel through script in which two columns namely "test id" and "release".second column have two subcolumns namely compiles and excutes. so i want first column should display test... (15 Replies)
Discussion started by: RamMore123
15 Replies

5. Shell Programming and Scripting

Script to create a CSV file

I created a script that will go out and so a "/sbin/chkconfig --list | egrep XXX" against a server list that would create an output file like the following example: ---------------------------------------------------------------------------------- SERVER1 RC_Script_1 0:off 1:off 2:off... (4 Replies)
Discussion started by: asnatlas
4 Replies

6. Shell Programming and Scripting

Compare 2 csv files in ksh and o/p the difference in a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 The... (7 Replies)
Discussion started by: Naresh101
7 Replies

7. Shell Programming and Scripting

BASH script outputting strange file formats

Hi I am very new to using BASH, but I have a problem with a piece of script that I have been working on. Basically the script goes through a mailbox file looking at particular aspects of the file, for example how many spamwords there are email address etc. It does this pretty well except for an... (13 Replies)
Discussion started by: 9aza
13 Replies

8. Shell Programming and Scripting

create csv in ksh from tricky log file

hi guys, trying to create a csv from a tricky log file in ksh, using 'awk '{print $1" "$14" "$15" "$16" "$17" "$18" "$19}' >> $TMP_FILE' on another set of files I have an output file with hundreds of lines in which looks like so: ABC_DEFGHI_16_JKLMNP11.20101115_095412_374.log:09:54:29.579... (3 Replies)
Discussion started by: rich@ardz
3 Replies

9. Shell Programming and Scripting

how to create csv file using shell script

I have a file in multiple directory which has some records in the following format File: a/latest.txt , b/latest.txt, c/latest.txt -> Name=Jhon Age=27 Gender=M Street=LA Road Occupation=Service I want to generate a csv file from the above file as follows File: output.csv -> ... (9 Replies)
Discussion started by: rjk2504
9 Replies

10. Shell Programming and Scripting

How to source a file in KSH -- Please help !!!

Hi, Can anyone tell me how to source a file inside a KSH script. I can do it in CSH using the "source" command. source /home/tipsy/sourceme.prf What is the equivalent command in KSH? I tried "exec" but after executing the statement no furthers statements are executed. Please let me... (7 Replies)
Discussion started by: tipsy
7 Replies
Login or Register to Ask a Question