Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


How do I format a Date field of a .CSV file with multiple commas in a string field?


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
How do I format a Date field of a .CSV file with multiple commas in a string field?

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below:

Code:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

The 9th field is the date field in the format "DD-MMM-YY". I have to convert it to the format YYYY/MM/DD. I am trying to use the below code, but of no use.

Code:
awk -F, '
 BEGIN {
 split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
 for (i=1; i<=12; i++) mdigit[month[i]]=i
 }
 { m=substr($9,4,3)
 $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
 print
 }' OFS="," file.csv > temp_file.csv

The out put of the file temp_file.csv after executing the above code is as shown below.

Code:
column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"

As far as I understand, the issue is with the commas in the double quote as my code is taking them into consideration too... Please suggest on the below questions:

1) Does the double quoting all the values in all the fields make any difference? If they make any difference, how do I get rid of them from all the values except the strings with commas in them?
2) Any modifications to my code so I could format the 9th field which in the format "DD-MMM-YYYY" to YYYY/MM/DD
# 2  
From here and your date code.

Code:
$ cat datefile.csv
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

$ awk -f csv2.awk datefile.csv
column1,column2,column3,column4,column5,column6, column7, Column8,"00/00/2000", Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","10/11/2011","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","10/12/2011","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","10/13/2011","232"

$ cat csv2.awk
BEGIN {
        if (NUMCOLS == "") NUMCOLS=32
        if (DELIM == "") DELIM = "\t"
        if (REPL == "") REPL = "~"
         split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
        for (i=1; i<=12; i++) mdigit[month[i]]=i
}
{
        gsub(DELIM, REPL)
        $0 = gensub(/([^,])\"\"/, "\\1'", "g")
        out = ""
        n = length($0)
        for (i = 1;  i <= n;  i++) {
                if ((ch = substr($0, i, 1)) == "\"") {
                        inString = (inString) ? 0 : 1
                }
                out = out ((ch == "," && ! inString) ? DELIM : ch)
        }
        nfields=split(out,outfields,DELIM);

         # Do stuff with date
        m=substr(outfields[9],5,3)
        outfields[9] = sprintf("\"%02d/%02d/"20"%02d\"",mdigit[m],substr(outfields[9],2,2),substr(outfields[9],9,20))

        for (i=1;(i<=nfields)&&(i<=NUMCOLS);i++) {
                if (i > 1) {
                        printf (",");
                }
                printf ("%s", outfields[i]);
        }
        printf ("\n");
}

(this CSV processing isn't mine, but I forget where I got it from)
This User Gave Thanks to CarloM For This Post:
# 3  
If input is in file x.x, then....
Code:
cat x.x | sed -e 's/,"/\|/g' | cut -f9 -d\| | tr -d \" | awk -F\- 'BEGIN{MON["JAN"] = "01"; MON["FEB"] = "02"; MON["MAR"] = "03"; MON["APR"] = "04"; MON["MAY"] = "05"; MON["JUN"] = "06"; MON["JUL"] = "07"; MON["AUG"] = "08"; MON["SEP"] = "09"; MON["OCT"] = "10"; MON["NOV"] = "11"; MON["DEC"] = "12"} { printf("20%s/%s/%s\n",$3,MON[$2],$1) }'

or create file x.awk with
Code:
BEGIN{MON["JAN"] = "01"; MON["FEB"] = "02"; MON["MAR"] = "03"; MON["APR"] = "04"; MON["MAY"] = "05"; MON["JUN"] = "06"; MON["JUL"] = "07"; MON["AUG"] = "08"; MON["SEP"] = "09"; MON[
"OCT"] = "10"; MON["NOV"] = "11"; MON["DEC"] = "12"} 
{ printf("20%s/%s/%s\n",$3,MON[$2],$1) }

and execute like
Code:
cat x.x | sed -e 's/,"/\|/g' | cut -f9 -d\| | tr -d \" | awk -F\- -fx.awk

---------- Post updated at 12:27 PM ---------- Previous update was at 11:47 AM ----------

Got the order of args in printf wrong, this should work...
Code:
cat x.x | sed -e 's/,"/\|/g' | cut -f9 -d\| | tr -d \" | awk -F\-  'BEGIN{MON["JAN"] = "01"; MON["FEB"] = "02"; MON["MAR"] = "03";  MON["APR"] = "04"; MON["MAY"] = "05"; MON["JUN"] = "06"; MON["JUL"] =  "07"; MON["AUG"] = "08"; MON["SEP"] = "09"; MON["OCT"] = "10";  MON["NOV"] = "11"; MON["DEC"] = "12"} {  printf("20%s/%s/%s\n",$1,MON[$2],$3) }'

This User Gave Thanks to blackrageous For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Help changing date format in the nth field

Hi, I have two (2) things that I want to do. First is to change the date format that is in the nth field from MM/DD/YY to YY/MM/DD. Preferably, I wish I know how to make it a 4-digit year but I don't. Problem is I can only assume it is a 20 century Second is somehow know how to figure out... (1 Reply)
Discussion started by: newbie_01
1 Replies

2. Shell Programming and Scripting

CSV Split field to check multiple codes

Hello, For work i am trying to generate a combined csv file excisting out of 2 other csv files. The problem i am facing is that the first field on both files have multiple values in there which arent always the same. This first field is also the joining part. The layout of the files is as... (16 Replies)
Discussion started by: SDohmen
16 Replies

3. Shell Programming and Scripting

Print particular string in a field of csv file - part 2

Hi, all I need your help and suggestions. I want to print particular strings in a field of a csv file and show them in terminal. Here is an example of the csv file. SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw... (7 Replies)
Discussion started by: refrain
7 Replies

4. Shell Programming and Scripting

Print particular string in a field of csv file

Hi, all I need your help and suggestions. I want to print particular strings in a field of a csv file and show them in terminal. Here is an example of the csv file. SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw... (9 Replies)
Discussion started by: refrain
9 Replies

5. Shell Programming and Scripting

awk - CSV file - field with single or multiple spaces

Hi, In a csv file, I want to select records where first column has zero or multiple spaces. Eg: abc.csv ,123,a ,22,b ,11,c a,11,d So output should be: ,123,a ,22,b ,11,c Please advise (5 Replies)
Discussion started by: vegasluxor
5 Replies

6. Shell Programming and Scripting

Field validations in multiple files CSV

Hi, I am regular reader of this forum. My advanced thanks to everyone. Below given are the sample files INDATA (Main data) Fild1«fld2«fld3….. Fild1«fld2«fld3….. Fild1«fld2«fld3….. Fild1«fld2«fld3….. Fild1«fld2«fld3….. . . N records (140000) eg GRPDATA (Reference file) (2 Replies)
Discussion started by: hyperion.krish
2 Replies

7. Shell Programming and Scripting

CSV with commas in field values, remove duplicates, cut columns

Hi Description of input file I have: ------------------------- 1) CSV with double quotes for string fields. 2) Some string fields have Comma as part of field value. 3) Have Duplicate lines 4) Have 200 columns/fields 5) File size is more than 10GB Description of output file I need:... (4 Replies)
Discussion started by: krishnix
4 Replies

8. Shell Programming and Scripting

Replace field with commas with field without commas

Hey guys, I have the following text: 1,2,3,4,5,6,'NULL','when',NULL,1,2,0,'NULL' 1,2,3,4,5,6,'NULL','what','NULL',1,2,0,1 I need the same text with the word NULL without commas u know something like this: 1,2,3,4,5,6,NULL,'when',NULL,1,2,0,NULL 1,2,3,4,5,6,NULL,'what','NULL',1,2,0,1 ... (1 Reply)
Discussion started by: lmyk72
1 Replies

9. Shell Programming and Scripting

csv file field needs to be changed current system date with awk

HI, I have csv file with records as shown below. 4102,Bangalore,G10,21,08/17/2011 09:28:33:188,99,08/17/2011 09:27:33:881,08/17/2011... (1 Reply)
Discussion started by: raghavendra.nsn
1 Replies

10. UNIX for Advanced & Expert Users

change field 2 date format

from this input WEBELSOLAR,29122009,1:1 WIPRO,15062010,2:3 ZANDUREALT,18012007,1:3 i want output as WEBELSOLAR,20091229,1:1 WIPRO,20100615,2:3 ZANDUREALT,20070118,1:3 basically input is in ddmmyyyy format and i was to convert it to yyyymmdd format (1 Reply)
Discussion started by: manishma71
1 Replies

Featured Tech Videos