AWK removing away needed double quotes.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK removing away needed double quotes.
# 1  
Old 05-07-2009
AWK removing away needed double quotes.

The below code is to convert csv file to pipe delimited.
It replaces comma with pipe if it is not in double quotes; If comma is in double quotes it doesnot replace the comma with a pipe.
The code works fine except it eat away the double quotes in the output file.

Quote:
Input file:
filed1,field2,field3
ABC,"hello,world",time
ABC,"hello,america",time1
XYZ,"god,bless",time2
INJ,are,youthere

Quote:
Output i am getting:
filed1|field2|field3
ABC|hello,world|time
ABC|hello,america|time1
XYZ|god,bless|time2
INJ|are|youthere
Quote:
Output needed:
filed1|field2|field3
ABC|"hello,world"|time
ABC|"hello,america"|time1
XYZ|"god,bless"|time2
INJ|are|youthere

Code:
BEGIN { FS=SUBSEP; OFS="|" }
{
  result = setcsv($0, ",")
  print
}
# setcsv(str, sep) - parse CSV (MS specification) input
# str, the string to be parsed. (Most likely $0.)
# sep, the separator between the values.
#
# After a call to setcsv the parsed fields are found in $1 to $NF.
# setcsv returns 1 on sucess and 0 on failure.
#
# By Peter Strvmberg aka PEZ.
# Based on setcsv by Adrian Davis. Modified to handle a separator
# of choice and embedded newlines. The basic approach is to take the
# burden off of the regular expression matching by replacing ambigious
# characters with characters unlikely to be found in the input. For
# this the characters "\035".
#
# Note 1. Prior to calling setcsv you must set FS to a character which
#         can never be found in the input. (Consider SUBSEP.)
# Note 2. If setcsv can't find the closing double quote for the string
#         in str it will consume the next line of input by calling
#         getline and call itself until it finds the closing double
#         qoute or no more input is available (considered a failiure).
# Note 3. Only the "" representation of a literal quote is supported.
# Note 4. setcsv will probably missbehave if sep used as a regular
#         expression can match anything else than a call to index()
#         would match.
#
function setcsv(str, sep, i) {
  gsub(/""/, "\035", str)
  gsub(sep, FS, str)
  while (match(str, /"[^"]*"/)) {
    middle = substr(str, RSTART+1, RLENGTH-2)
    gsub(FS, sep, middle)
    str = sprintf("%.*s%s%s", RSTART-1, str, middle,
      substr(str, RSTART+RLENGTH))
  }
  if (index(str, "\"")) {
    return ((getline) > 0) ? setcsv(str (RT != "" ? RT : RS) $0, sep) : !setcsv(str "\"", sep)
  } else {
    gsub(/\035/, "\"", str)
    $0 = str
    for (i = 1; i <= NF; i++)
      if (match($i, /^"+$/))
        $i = substr($i, 2)
    $1 = $1 ""
    return 1
  }
}

Appreciate help.
# 2  
Old 05-07-2009
Try this:

Code:
awk 'BEGIN{FS1=",";FS2="\""}$0 !~ /"/{FS=FS1;gsub(/,/,"|");print}
$0 ~ /"/{FS=FS2;gsub(/,"/,"|\"");gsub(/",/,"\"|");print}' file


cheers,
Devaraj Takhellambam
# 3  
Old 05-07-2009
Quote:
Originally Posted by devtakh
Try this:

Code:
awk 'BEGIN{FS1=",";FS2="\""}$0 !~ /"/{FS=FS1;gsub(/,/,"|");print}
$0 ~ /"/{FS=FS2;gsub(/,"/,"|\"");gsub(/",/,"\"|");print}' file


cheers,
Devaraj Takhellambam

The code works only for the file i gave earlier.
If the file has this kind of data.
Quote:
camel,Unix,12345,12345,12345,"This is a awk tesing. It allows for tesing awk, sed,cat and sort.",
it produces output
Quote:
camel,Unix,12345,12345,12345|"This is a awk tesing. It allows for tesing awk, sed,cat and sort."|
Required output
Quote:
camel|Unix|12345|12345|12345|"This is a awk tesing. It allows for tesing awk, sed,cat and sort."|
The code i pasted earlier works fine but is eating away the double quotes and is versatile for any file

Thanks
# 4  
Old 05-07-2009
Ok. I got it.
# 5  
Old 05-07-2009
Quote:
Originally Posted by devtakh
Ok. I got it.
I got this on unix.com
Code:
nawk -F"\"" '{for(i=1;i<=NF;i++){if(i%2)gsub(",","|",$i)}}1' OFS= file1

I modified this to
Code:
nawk -F"\"" 'BEGIN{OFS="\""}{for(i=1;i<=NF;i++){if(i%2)gsub(",","|",$i)}}1'  file

this works fine.
Would appreciate someone can interpret this.

Code:
{if(i%2)gsub(",","|",$i)

Thanks
# 6  
Old 05-08-2009
1. the fiels seperator is made as "
2. you will loop through each fields in the record.
3. if the field no is divisible by 2, then dont make the substitute for , with a |
4. print the record.

Lets take an example:
camel,Unix,12345,12345,12345,"This is a awk tesing. It allows for tesing awk, sed,cat and sort.","name,age,thing",time is,"one,two""
In the above, the BLUE color text are odd no fields, whereas the red ones are the even fields. so the if (i%2) will substitute the , to a | only in the odd no fields.

Hope this helps.


cheers,
Devaraj Takhellambam
# 7  
Old 05-08-2009
if you have Python, here's an alternative
Code:
import csv
writer = csv.writer(open("output.csv","w"), delimiter="|")
filename = "file"
reader = csv.reader(open(filename))
for row in reader:    
    writer.writerow(row)

output:
Code:
# more file
filed1,field2,field3
ABC,"hello,world",time
ABC,"hello,america",time1
XYZ,"god,bless",time2
INJ,are,youthere
# ./test.py
# more output.csv
filed1|field2|field3
ABC|hello,world|time
ABC|hello,america|time1
XYZ|god,bless|time2
INJ|are|youthere

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk -File contain double quotes

Hi ALL, file data like : test.csv a,b,"c,d" my awk version is 4.0.2 ,if i am using the below code is working fine. awk -vFPAT='(*)|("+")' -vOFS="," '{print $3}' test.csv if the awk version is 3.1.7 is not working . Could you please help me on this one. output should be : "c,d" (6 Replies)
Discussion started by: bmk123
6 Replies

2. Shell Programming and Scripting

awk to parse comma separated field and removing comma in between number and double quotes

Hi Experts, Please support I have below data in file in comma seperated, but 4th column is containing comma in between numbers, bcz of which when i tried to parse the file the column 6th value(5049641141) is being removed from the file and value(222.82) in column 5 becoming value of column6. ... (3 Replies)
Discussion started by: as7951
3 Replies

3. Shell Programming and Scripting

Removing consecutive double quotes

Hi I have a .csv file and when opened in notepad looks like this gggg,nnnn,"last,first","llll""",nnn So, Here I would like the ouput as below gggg,nnnn,"last,first","llll",nnn i.e replace all two double quotes into one. How could I do that? This file is being processed by another... (5 Replies)
Discussion started by: dnat
5 Replies

4. Shell Programming and Scripting

Replace Double quotes within double quotes in a column with space while loading a CSV file

Hi All, I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes. Sample Data : "221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply... (6 Replies)
Discussion started by: mlavanya
6 Replies

5. UNIX for Dummies Questions & Answers

Help populating double quotes using awk

Want to populate double quotes for each filed using awk: Input: cat file.txt => "1-23-test_test1-test2" Required output : "1-23-test_test1-test2"|"#GT_properties_xyz" Was trying the below command on solaris 9 machine : awk -F"|" '{print $1"|""#GT_properties_xyz"}' file.txt ... (8 Replies)
Discussion started by: rajachandhok
8 Replies

6. Shell Programming and Scripting

Replace double double quotes using AWK/SED

Hi, I have data as "01/22/97-"aaaaaaaaaaaaaaaaa""aaa""aabbbbbbbbcccccc""zbcd""dddddddddeeeeeeeeefffffff" I want to remove only the Consequitive double quotes and not the one which occurs single. My O/P must be ... (2 Replies)
Discussion started by: Bhuvaneswari
2 Replies

7. Shell Programming and Scripting

Using double quotes in awk

Hi I read somewhere that when using double quotes in awk; variables gets expanded else it doesn't. So I tried to use the double quotes inside an awk statement as below: from_instance_trans=`awk "/INPUT =\"$frm_inst\"/,/<\/TRANSFORMATION>/" $xml_object | grep -w "<TRANSFIELD" | awk... (9 Replies)
Discussion started by: dips_ag
9 Replies

8. Shell Programming and Scripting

removing extra double quotes between pipe dilimeter

I have a flat file sample like this - "COURSE"|"ddddd " " dddd"|"sssddd sdsdsdsdx" dddddddd ffffff "aaaaa" dddddddd ffffff sdsdsd"|"xxxxxxx"| "COURSE"|"ffff " " bbbb"|"lllll"| The delimiter is pipe character (|) and the text are enclosed in double quotes... (5 Replies)
Discussion started by: vishalzone
5 Replies

9. UNIX for Dummies Questions & Answers

Removing double quotes in a file

Hi All, I have a tab delimited file where each of the strings have double quotes. The problem is that I have records which are in the following format: "TEXAS" ""HOUSTON"" "123" "" "2625-39-39" ""MAINE"" "" "456" "I" "3737-39-82" I would have to output... (3 Replies)
Discussion started by: kingofprussia
3 Replies

10. Shell Programming and Scripting

sed removing comma inside double quotes

I have a csv file with lines like the followings 123456,"ABC CO., LTD","XXX" 789012,"DEF LIMITED", "XXX" before I bcp this file to database, the comma in "CO.," need to be removed first. My script is cat <filename> | sed 's/"CO.,"/"CO."/g' but it doesn't work. Can anyone here able to... (2 Replies)
Discussion started by: joanneho
2 Replies
Login or Register to Ask a Question