Insert blanks for missing fields and reformat to csv

08-17-2009

Registered User

7, 0

Join Date: Aug 2009

Last Activity: 19 February 2010, 10:45 AM EST

Posts: 7

Thanks Given: 0

Thanked 0 Times in 0 Posts

[Bash] Insert blanks for missing fields and reorder

I am trying to process inventory addition files for insertion into a MySQL database. The format convention is book UIEE. If the field for the add file has no data, the field is NOT included in the upload file, so I need to add a blank/empty for any missing fields. The I need to create a csv or tsv file for the actual insertion. I have the main conversion part down, but I am having issues with inserting the blanks. I need to check if it is present first.

Here is the code I have so far:

Code:

#!/usr/bin/env bash
#
# This is a script to process incoming add files
#
# Find the file
file_name=`find . -name "*part*"`;
# Change file permissions
chmod 744 $file_name;
# Convert to Unix format and remove ^M char (change to fromdos on server)
`dos2unix $file_name`;
# Remove the top 5 lines if User is on the first, escape existing ";" 
# and insert a temp char set on blank line at end of each record
if grep "User" $file_name; then
    sed '1,5d' $file_name | sed -e 's/;/\\;/g' -e 's/"/\\"/g' -e 's/^$/{}/g' > trimAdd;
else echo " ";
fi
# Swap a ";" for Win carriage return and swap a newline at end of records for the temp char
tr '\012' ';' < trimAdd | sed 's/{};/\n/g' > swapAdd;
# check record for missing fields and insert blanks/NULLS where necessary

Here is a sample of the file I am processing (routinely contains 1200+ lines and yes the last line is empty:

Code:

User
BOOKS
2009-08-06
14:16:52

UR|007815
TI|Vintage Motorsport : 1995 Jan/Feb
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1995
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007816
TI|Vintage Motorsport : 1992 Nov/Dec
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1992
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007817
TI|Vintage Motorsport : 1995 Mar/Apr
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1995
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007818
TI|Vintage Motorsport : 1993 Nov/Dec
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1993
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

UR|007819
TI|Vintage Motorsport : 1995 Jul/Aug
PR|17.50
BD|Soft Cover
NT| 82 pgs; magazine format
CO|1
SD|2009-08-06 14:04:20
CA|Transportation
MT|Transportation
DP|1995
JK|No Jacket
XA|4
XB|1
XC|BO
XD|S

Any help, even it is to point into a better direction is greatly appreciated. I am not familiar with awk, but I do know sed.

---------- Post updated 08-17-09 at 05:56 PM ---------- Previous update was 08-16-09 at 10:34 PM ----------

Here is another failed attempt, but perhaps it will give a better idea of what I am trying to accomplish:

Code:

#!/usr/bin/env bash
# Find the file
file_name=`find . -name "*part*"`;
# Change file permissions
chmod 744 $file_name;
# Convert to Unix format and remove ^M char
`dos2unix $file_name`;
# Remove the top 5 lines if User is on the first, escape existing ";" 
# and insert a temp char set on blank line at end of each record
if grep "User" $file_name; then
    sed '1,5d' $file_name | sed -e 's/,/\\,/g' -e 's/"/\\"/g' -e 's/^$/{}/g' > trimAdd;
else echo " ";
fi
# Swap a ";" for Win carriage return and swap a newline at end of records for the temp char
tr '\012' ',' < trimAdd | sed 's/{},/\n/g' > swapAdd;
sed 's/.\*/UR\|.\*;TI\|.\*;PR\|.\*;AA\|.\*;BN\|.\*;BD\|.\*;NT\|.\*;CO\|.\*;LO\|.\*;PC\|.\*;SD\|.\*;CN\|.\*;CA\|.\*;MT\|.\*;PP\|.\*;DP\|.\*;KE\|.\*;JK\|.\*;SG\|.\*;S1\|.\*;ED\|.\*;PU\|.\*;XA\|.\*;XB\|.\*;XC\|.\*;XD\|.\*;/g' swapAdd > finAdd;

Again, any help, even if to only point out errors, might help me get past this sticking point.

Thanks in advance!

Last edited by LoveSquid; 08-17-2009 at 05:36 PM..

LoveSquid

View Public Profile for LoveSquid

Find all posts by LoveSquid

08-17-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

it would help if you provided a desired output given a sample you provided with the detailed explanation.

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

08-17-2009

Registered User

7, 0

Join Date: Aug 2009

Last Activity: 19 February 2010, 10:45 AM EST

Posts: 7

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks for the reply vgersh99, here is what I am looking for as the output:

Code:

UR|007815,TI|Vintage Motorsport : 1995 Jan/Feb,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1995,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007816,TI|Vintage Motorsport : 1992 Nov/Dec,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1992,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007817,TI|Vintage Motorsport : 1995 Mar/Apr,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1995,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007818,TI|Vintage Motorsport : 1993 Nov/Dec,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1993,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S
UR|007819,TI|Vintage Motorsport : 1995 Jul/Aug,PR|17.50,,,BD|Soft Cover,NT| 82 pgs; magazine format,CO|1,,,SD|2009-08-06 14:04:20,,CA|Transportation,MT|Transportation,,DP|1995,,JK|No Jacket,,,,,XA|4,XB|1,XC|BO,XD|S

LoveSquid

View Public Profile for LoveSquid

Find all posts by LoveSquid

08-18-2009

Registered User

156, 1

Join Date: Jul 2009

Last Activity: 7 March 2010, 1:01 AM EST

Posts: 156

Thanks Given: 0

Thanked 1 Time in 1 Post

Try this .. it's an awk program....

Code:

BEGIN{
# set the record seperator to |
FS="|"
}

function clear_data()
{
# init data array to what to put if field has no data
        value["UR"] = ","
        value["TI"] = ","
        value["PR"] = ","
        value["BD"] = ","
        value["NT"] = ","
        value["CO"] = ","
        value["SD"] = ","
        value["CA"] = ","
        value["MT"] = ","
        value["DP"] = ","
        value["JK"] = ","
        value["XA"] = ","
        value["XB"] = ","
        value["XC"] = ","
        value["XD"] = " "
}


# skip the first five records
(FNR < 6) {
        getline
        getline
        getline
        getline
        getline
        clear_data()
}

# process record
($1 == "UR") {
# clear the record data

        clear_data()
#consume the record values
        while ($0 != "") {
                value[$1]=($1 != "XD" ? ($0 ",") : $0 )
                if( getline <= 0)
                        exit
        }
# print the new output
        printf("%s%s%s,,%s%s,,%s,%s,%s%s,,,%s%s%s%s%s\n",value["UR"],value["TI"],value["PR"],value["BD"], value["CO"],value["SD"], value["CA"],value["MT"], value["DP"], value["JK"], value["XA"], value["XB"], value["XC"], value["XD"])
}
END {
        print
}

Last edited by jp2542a; 08-18-2009 at 12:48 AM.. Reason: left debug statements in code

jp2542a

View Public Profile for jp2542a

Find all posts by jp2542a

08-18-2009

Registered User

187, 4

Join Date: Jul 2009

Last Activity: 20 February 2013, 8:48 AM EST

Posts: 187

Thanks Given: 0

Thanked 4 Times in 4 Posts

Will this work for you?

Code:

grep '|' file_name | tr '\n' ',' | sed -e 's/,\(UR|[0-9]\{6\},TI\)/\n\1/g; s/,$//'

Assumed:
'UR|' will always be followed by 6 numbers and a comma.
If not reduce it as required.
You have not mentioned what part is constant.

edidataguy

View Public Profile for edidataguy

Find all posts by edidataguy

08-18-2009

Registered User

7, 0

Join Date: Aug 2009

Last Activity: 19 February 2010, 10:45 AM EST

Posts: 7

Thanks Given: 0

Thanked 0 Times in 0 Posts

jp2542a: Thanks, I will give that a try this morning and see what results I get. I'm not familiar with awk, but plan to learn it so this may be an opportunity.

edidataguy: Thanks for pointing that out! The only constant is that all will begin with UR and all will end with XD and then an empty (after removal of ^M) line. This is obviously generated on a Win machine, buy I am working entirely on Linux desktop and a Linux server.

That said, it may help if I state that I will then use this to insert and delete records in a MySQL database and run as a cron job. Thanks for the help!

Last edited by LoveSquid; 08-18-2009 at 11:04 AM..

LoveSquid

View Public Profile for LoveSquid

Find all posts by LoveSquid

08-18-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Code:

nawk '
  BEGIN {
    FS=RS=""
    OFS=","
  }
  FNR>1 {$3=$3 OFS OFS; $6=$6 OFS OFS;$7=$7 OFS;$9=$9 OFS;$10=$10 OFS;$11=$11 OFS OFS OFS OFS; print}' myFile

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

Shell Programming and Scripting

Insert blanks for missing fields and reformat to csv

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to extract fields from a CSV i.e comma separated where some of the fields having comma as value?

Discussion started by: J.Jena

2. Shell Programming and Scripting

awk to insert missing string based on pattern in file

Discussion started by: cmccabe

3. Shell Programming and Scripting

How to print the missing fields outside the for loop in Korn shell?

Discussion started by: ann15

4. Shell Programming and Scripting

Compare 2 files and find missing fields awk

Discussion started by: phaethon

5. Shell Programming and Scripting

Insert missing values

Discussion started by: ritakadm

6. Shell Programming and Scripting

Reformat csv file

Discussion started by: jamaje

7. UNIX for Dummies Questions & Answers

How to combine and insert missing consecutive numbers - awk or script?

Discussion started by: newbie_01

8. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Discussion started by: ks_reddy

9. Shell Programming and Scripting

Insert missing field using perl,sed,awk

Discussion started by: vrclm

10. Shell Programming and Scripting

CSV to SQL insert: Awk for strings with multiple lines in csv

Discussion started by: khayal