How do I format a Date field of a .CSV file with multiple commas in a string field?


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 10-15-2013
How do I format a Date field of a .CSV file with multiple commas in a string field?

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:

Code:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

The 9th field is the date field in the format "DD-MMM-YY". I have to convert it to the format YYYY/MM/DD. I am trying to use the below code, but of no use.

Code:
awk -F, '
 BEGIN {
 split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
 for (i=1; i<=12; i++) mdigit[month[i]]=i
 }
 { m=substr($9,4,3)
 $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
 print
 }' OFS="," file.csv > temp_file.csv

The out put of the file temp_file.csv after executing the above code is as shown below.

Code:
column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"

As far as I understand, the issue is with the commas in the double quote as my code is taking them into consideration too... Please suggest on the below questions:

1) Does the double quoting all the values in all the fields make any difference? If they make any difference, how do I get rid of them from all the values except the strings with commas in them?
2) Any modifications to my code so I could format the 9th field which in the format "DD-MMM-YYYY" to YYYY/MM/DD
# 2  
Old 10-16-2013
From here and your date code.

Code:
$ cat datefile.csv
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

$ awk -f csv2.awk datefile.csv
column1,column2,column3,column4,column5,column6, column7, Column8,"00/00/2000", Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","10/11/2011","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","10/12/2011","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","10/13/2011","232"

$ cat csv2.awk
BEGIN {
        if (NUMCOLS == "") NUMCOLS=32
        if (DELIM == "") DELIM = "\t"
        if (REPL == "") REPL = "~"
         split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
        for (i=1; i<=12; i++) mdigit[month[i]]=i
}
{
        gsub(DELIM, REPL)
        $0 = gensub(/([^,])\"\"/, "\\1'", "g")
        out = ""
        n = length($0)
        for (i = 1;  i <= n;  i++) {
                if ((ch = substr($0, i, 1)) == "\"") {
                        inString = (inString) ? 0 : 1
                }
                out = out ((ch == "," && ! inString) ? DELIM : ch)
        }
        nfields=split(out,outfields,DELIM);

         # Do stuff with date
        m=substr(outfields[9],5,3)
        outfields[9] = sprintf("\"%02d/%02d/"20"%02d\"",mdigit[m],substr(outfields[9],2,2),substr(outfields[9],9,20))

        for (i=1;(i<=nfields)&&(i<=NUMCOLS);i++) {
                if (i > 1) {
                        printf (",");
                }
                printf ("%s", outfields[i]);
        }
        printf ("\n");
}

(this CSV processing isn't mine, but I forget where I got it from)
This User Gave Thanks to CarloM For This Post:
dhruuv369 (10-16-2013)
# 3  
Old 10-16-2013
If input is in file x.x, then....
Code:
cat x.x | sed -e 's/,"/\|/g' | cut -f9 -d\| | tr -d \" | awk -F\- 'BEGIN{MON["JAN"] = "01"; MON["FEB"] = "02"; MON["MAR"] = "03"; MON["APR"] = "04"; MON["MAY"] = "05"; MON["JUN"] = "06"; MON["JUL"] = "07"; MON["AUG"] = "08"; MON["SEP"] = "09"; MON["OCT"] = "10"; MON["NOV"] = "11"; MON["DEC"] = "12"} { printf("20%s/%s/%s\n",$3,MON[$2],$1) }'

or create file x.awk with
Code:
BEGIN{MON["JAN"] = "01"; MON["FEB"] = "02"; MON["MAR"] = "03"; MON["APR"] = "04"; MON["MAY"] = "05"; MON["JUN"] = "06"; MON["JUL"] = "07"; MON["AUG"] = "08"; MON["SEP"] = "09"; MON[
"OCT"] = "10"; MON["NOV"] = "11"; MON["DEC"] = "12"} 
{ printf("20%s/%s/%s\n",$3,MON[$2],$1) }

and execute like
Code:
cat x.x | sed -e 's/,"/\|/g' | cut -f9 -d\| | tr -d \" | awk -F\- -fx.awk

---------- Post updated at 12:27 PM ---------- Previous update was at 11:47 AM ----------

Got the order of args in printf wrong, this should work...
Code:
cat x.x | sed -e 's/,"/\|/g' | cut -f9 -d\| | tr -d \" | awk -F\-  'BEGIN{MON["JAN"] = "01"; MON["FEB"] = "02"; MON["MAR"] = "03";  MON["APR"] = "04"; MON["MAY"] = "05"; MON["JUN"] = "06"; MON["JUL"] =  "07"; MON["AUG"] = "08"; MON["SEP"] = "09"; MON["OCT"] = "10";  MON["NOV"] = "11"; MON["DEC"] = "12"} {  printf("20%s/%s/%s\n",$1,MON[$2],$3) }'

This User Gave Thanks to blackrageous For This Post:
dhruuv369 (10-16-2013)
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Help changing date format in the nth field newbie_01 UNIX for Advanced & Expert Users 1 06-16-2017 04:57 PM
CSV Split field to check multiple codes SDohmen Shell Programming and Scripting 16 02-02-2016 04:27 AM
Print particular string in a field of csv file - part 2 refrain Shell Programming and Scripting 7 04-26-2015 07:24 PM
Print particular string in a field of csv file refrain Shell Programming and Scripting 9 04-08-2015 04:26 AM
awk - CSV file - field with single or multiple spaces vegasluxor Shell Programming and Scripting 5 01-28-2015 04:07 AM
Match columns from two csv files and update field in one of the csv file djoseph Shell Programming and Scripting 10 11-27-2014 07:20 AM
Removing date field from the string skatpally Shell Programming and Scripting 9 11-26-2014 03:32 PM
How to delete the commas in a .CSV file that are enclosed in a string with double quotes? dhruuv369 Shell Programming and Scripting 8 02-11-2014 06:30 PM
Field validations in multiple files CSV hyperion.krish Shell Programming and Scripting 2 01-31-2012 06:53 AM
CSV with commas in field values, remove duplicates, cut columns krishnix Shell Programming and Scripting 4 12-08-2011 01:25 AM
Replace field with commas with field without commas lmyk72 Shell Programming and Scripting 1 11-17-2011 11:22 PM
Awk Search text string in field, not all in field. rocket_dog Shell Programming and Scripting 3 09-12-2011 09:09 AM
csv file field needs to be changed current system date with awk raghavendra.nsn Shell Programming and Scripting 1 08-18-2011 06:58 AM
Matching lines across multiple csv files and merging a particular field Demosthenes Shell Programming and Scripting 1 07-29-2010 06:02 PM
change field 2 date format manishma71 UNIX for Advanced & Expert Users 1 07-21-2010 04:10 AM