Insert blanks for missing fields and reformat to csv


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Insert blanks for missing fields and reformat to csv
# 1  
Old 08-17-2009
[Bash] Insert blanks for missing fields and reorder

I am trying to process inventory addition files for insertion into a MySQL database. The format convention is book UIEE. If the field for the add file has no data, the field is NOT included in the upload file, so I need to add a blank/empty for any missing fields. The I need to create a csv or tsv file for the actual insertion. I have the main conversion part down, but I am having issues with inserting the blanks. I need to check if it is present first.

Here is the code I have so far:
Code:
#!/usr/bin/env bash
#
# This is a script to process incoming add files
#
# Find the file
file_name=`find . -name "*part*"`;
# Change file permissions
chmod 744 $file_name;
# Convert to Unix format and remove ^M char (change to fromdos on server)
`dos2unix $file_name`;
# Remove the top 5 lines if User is on the first, escape existing ";" 
# and insert a temp char set on blank line at end of each record
if grep "User" $file_name; then
    sed '1,5d' $file_name | sed -e 's/;/\\;/g' -e 's/"/\\"/g' -e 's/^$/{}/g' > trimAdd;
else echo " ";
fi
# Swap a ";" for Win carriage return and swap a newline at end of records for the temp char
tr '\012' ';' < trimAdd | sed 's/{};/\n/g' > swapAdd;
# check record for missing fields and insert blanks/NULLS where necessary

Here is a sample of the file I am processing (routinely contains 1200+ lines and yes the last line is empty:
Code:
User
BOOKS
2009-08-06
14:16:52

UR|007815
TI|Vintage Motorsport : 1995 Jan/Feb
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1995
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007816
TI|Vintage Motorsport : 1992 Nov/Dec
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1992
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007817
TI|Vintage Motorsport : 1995 Mar/Apr
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1995
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007818
TI|Vintage Motorsport : 1993 Nov/Dec
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1993
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007819
TI|Vintage Motorsport : 1995 Jul/Aug
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1995
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

Any help, even it is to point into a better direction is greatly appreciated. I am not familiar with awk, but I do know sed.

---------- Post updated 08-17-09 at 05:56 PM ---------- Previous update was 08-16-09 at 10:34 PM ----------

Here is another failed attempt, but perhaps it will give a better idea of what I am trying to accomplish:

Code:
#!/usr/bin/env bash
# Find the file
file_name=`find . -name "*part*"`;
# Change file permissions
chmod 744 $file_name;
# Convert to Unix format and remove ^M char
`dos2unix $file_name`;
# Remove the top 5 lines if User is on the first, escape existing ";" 
# and insert a temp char set on blank line at end of each record
if grep "User" $file_name; then
    sed '1,5d' $file_name | sed -e 's/,/\\,/g' -e 's/"/\\"/g' -e 's/^$/{}/g' > trimAdd;
else echo " ";
fi
# Swap a ";" for Win carriage return and swap a newline at end of records for the temp char
tr '\012' ',' < trimAdd | sed 's/{},/\n/g' > swapAdd;
sed 's/.\*/UR\|.\*;TI\|.\*;PR\|.\*;AA\|.\*;BN\|.\*;BD\|.\*;NT\|.\*;CO\|.\*;LO\|.\*;PC\|.\*;SD\|.\*;CN\|.\*;CA\|.\*;MT\|.\*;PP\|.\*;DP\|.\*;KE\|.\*;JK\|.\*;SG\|.\*;S1\|.\*;ED\|.\*;PU\|.\*;XA\|.\*;XB\|.\*;XC\|.\*;XD\|.\*;/g' swapAdd > finAdd;

Again, any help, even if to only point out errors, might help me get past this sticking point.

Thanks in advance!

Last edited by LoveSquid; 08-17-2009 at 05:36 PM..
# 2  
Old 08-17-2009
it would help if you provided a desired output given a sample you provided with the detailed explanation.
# 3  
Old 08-17-2009
Thanks for the reply vgersh99, here is what I am looking for as the output:

Code:
UR|007815,TI|Vintage Motorsport : 1995 Jan/Feb,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1995,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007816,TI|Vintage Motorsport : 1992 Nov/Dec,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1992,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007817,TI|Vintage Motorsport : 1995 Mar/Apr,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1995,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007818,TI|Vintage Motorsport : 1993 Nov/Dec,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1993,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007819,TI|Vintage Motorsport : 1995 Jul/Aug,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1995,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S

# 4  
Old 08-18-2009
Try this .. it's an awk program....

Code:
BEGIN{
# set the record seperator to |
FS="|"
}

function clear_data()
{
# init data array to what to put if field has no data
        value["UR"] = ","
        value["TI"] = ","
        value["PR"] = ","
        value["BD"] = ","
        value["NT"] = ","
        value["CO"] = ","
        value["SD"] = ","
        value["CA"] = ","
        value["MT"] = ","
        value["DP"] = ","
        value["JK"] = ","
        value["XA"] = ","
        value["XB"] = ","
        value["XC"] = ","
        value["XD"] = " "
}


# skip the first five records
(FNR < 6) {
        getline
        getline
        getline
        getline
        getline
        clear_data()
}

# process record
($1 == "UR") {
# clear the record data

        clear_data()
#consume the record values
        while ($0 != "") {
                value[$1]=($1 != "XD" ? ($0 ",") : $0 )
                if( getline <= 0)
                        exit
        }
# print the new output
        printf("%s%s%s,,%s%s,,%s,%s,%s%s,,,%s%s%s%s%s\n",value["UR"],value["TI"],value["PR"],value["BD"], value["CO"],value["SD"], value["CA"],value["MT"], value["DP"], value["JK"], value["XA"], value["XB"], value["XC"], value["XD"])
}
END {
        print
}


Last edited by jp2542a; 08-18-2009 at 12:48 AM.. Reason: left debug statements in code
# 5  
Old 08-18-2009
Will this work for you?
Code:
grep '|' file_name | tr '\n' ',' | sed -e 's/,\(UR|[0-9]\{6\},TI\)/\n\1/g; s/,$//'

Assumed:
'UR|' will always be followed by 6 numbers and a comma.
If not reduce it as required.
You have not mentioned what part is constant.
# 6  
Old 08-18-2009
MySQL

jp2542a: Thanks, I will give that a try this morning and see what results I get. I'm not familiar with awk, but plan to learn it so this may be an opportunity.


edidataguy: Thanks for pointing that out! The only constant is that all will begin with UR and all will end with XD and then an empty (after removal of ^M) line. This is obviously generated on a Win machine, buy I am working entirely on Linux desktop and a Linux server.

That said, it may help if I state that I will then use this to insert and delete records in a MySQL database and run as a cron job. Thanks for the help!

Last edited by LoveSquid; 08-18-2009 at 11:04 AM..
# 7  
Old 08-18-2009
Code:
nawk '
  BEGIN {
    FS=RS=""
    OFS=","
  }
  FNR>1 {$3=$3 OFS OFS; $6=$6 OFS OFS;$7=$7 OFS;$9=$9 OFS;$10=$10 OFS;$11=$11 OFS OFS OFS OFS; print}' myFile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to extract fields from a CSV i.e comma separated where some of the fields having comma as value?

can anyone help me!!!! How to I parse the CSV file file name : abc.csv (csv file) The above file containing data like abv,sfs,,hju,',',jkk wff,fst,,rgr,',',rgr ere,edf,erg,',',rgr,rgr I have a requirement like i have to extract different field and assign them into different... (4 Replies)
Discussion started by: J.Jena
4 Replies

2. Shell Programming and Scripting

awk to insert missing string based on pattern in file

Using the file below, which will always have the first indicated by the digit after the - and last id in it, indicated by the digit after the -, I am trying to use awk to print the missing line or lines in file following the pattern of the previous line. For example, in the file below the next... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

How to print the missing fields outside the for loop in Korn shell?

I have 2 for loop in my program , first one will list files based on timestamp and second one list the files based on type(RPT / SUB_RPT).Here is my code: #!/bin/ksh STG_DIR=/home/stg for pattern in `find $STG_DIR -type f -name 'IBC*csv' | awk -F'' '{print $(NF-1)}' | sort -u` do echo... (2 Replies)
Discussion started by: ann15
2 Replies

4. Shell Programming and Scripting

Compare 2 files and find missing fields awk

Hello experts! I have 2 files. file1 is a list file containing uniquely names. e.g.: name1 number number name2 number number name5 number number name10 number number ... file2 is a data file arbitrary containing the names of file1 in paragraphs separated by "10" e.g. name4 ... (3 Replies)
Discussion started by: phaethon
3 Replies

5. Shell Programming and Scripting

Insert missing values

Hi, please help with this, I need to insert missing values into a matrix for a regression analysis. I have made up an example. The first three columns are variables with levels and the next 3 are values, the 4th column missing values should be replaced by 0s, and 5th and 6th column missing... (3 Replies)
Discussion started by: ritakadm
3 Replies

6. Shell Programming and Scripting

Reformat csv file

Hi, I have a csv file with content like: 1,0,100 1,1,150 2,0,200 2,1,250 3,0,300 3,1,350 I want an output such that all numbers in 3rd col where 2nd col is "0" come in the same col in the output. The same goes for numbers where 2nd col is "1". 1 100 150 2 200 250 3 300 350 Tnx... (2 Replies)
Discussion started by: jamaje
2 Replies

7. UNIX for Dummies Questions & Answers

How to combine and insert missing consecutive numbers - awk or script?

Hi all, I have two (2) sets of files that are based on some snapshots of database that I want to merge and insert any missing sequential number. Below are example representation of these files: file1: DATE TIME COL1 COL2 COL3 COL4 ID 01/10/2013 0800 100 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

8. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

9. Shell Programming and Scripting

Insert missing field using perl,sed,awk

sample file (comma as field separators) MessageFlow,1,BusIntBatchMgr,a OOBEvent,1,BusIntBatchMgr,a TaskEvents,1,,a MTTrace,1,,a MTWarning,,1,a MessageFlow,1,Batch,a OOBEvent,1,Batch,a TaskEvents,1,,a EAISAPIdocWizard,1,BusIntMgr,a EAISAPBAPIWizard,1,BusIntMgr,a... (3 Replies)
Discussion started by: vrclm
3 Replies

10. Shell Programming and Scripting

CSV to SQL insert: Awk for strings with multiple lines in csv

Hi Fellows, I have been struggling to fix an issue in csv records to compose sql statements and have been really losing sleep over it. Here is the problem: I have csv files in the following pipe-delimited format: Column1|Column2|Column3|Column4|NEWLINE Address Type|some descriptive... (4 Replies)
Discussion started by: khayal
4 Replies
Login or Register to Ask a Question