How to delete the commas in a .CSV file that are enclosed in a string with double quotes?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to delete the commas in a .CSV file that are enclosed in a string with double quotes?
# 1  
Old 02-10-2014
How to delete the commas in a .CSV file that are enclosed in a string with double quotes?

Okay, I would like to delete all the commas in a .CSV file (TEST.CSV) or at least substitute them with empty space, that are enclosed in double quote.

Please see the sample file as below:
Code:
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",string 1,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas","string with or, without commas",string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",string 1,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",string 2,CAND,388,70%,09/21/2013,

I was trying the below code but, it is deleting all the commas in the file:
Code:
awk -F'"' '{gsub (/,/,"\001",$0)}1' OFS='"' TEST.CSV

Using either SED or AWK I am expecting the output as below:
Code:
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a  comma","string with or without commas",string 1,USD,12,70%,08/01/2013,
23455,12312255564,"string with multiple commas","string with or without commas",string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string with commas and - hypens and: semicolans","string with or without commas",string 1,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string with commas multiple: colans","string with or without commas",string 2,CAND,388,70%,09/21/2013,

# 2  
Old 02-10-2014
Code:
awk -F'"' -v OFS='"' '{for(i=2;i<NF;i+=2) gsub(",", "", $i)}1'

This User Gave Thanks to in2nix4life For This Post:
# 3  
Old 02-10-2014
@in2nix4life: The code works great, however, I will really appreciate if you could please help me understand the for condition? like, why was the i assigned 2 as the initial value and also, the variable was assigned at the beginning of the awk, and how is this used in the later part of the code?
# 4  
Old 02-10-2014
By setting FS and OFS to a double quote character (-F'"' -v OFS='"'), in2nix4life told awk to use the double quote character as the field separator when lines are being read from standard input file and written to standard output. This causes the text before the 1st double quote to be treated as field 1, the text between the 1st and 2nd double quotes as the 2nd field, etc. So odd numbered fields contain data outside of the double quoted strings and even numbered fields correspond to data inside the double quoted strings.

The for loop:
Code:
for(i=2;i<NF;i+=2)
        gsub(",", "", $i)

calls the global substitution function (gsub) to change all occurrences of a comma (",") to an empty string ("") in field i ($i) for even numbered fields (start with 2 [i=2], increment the field number by 2 at the end of processing each time through the loop [i+=2] and continue processing as long as i is less than the number of fields on the current line [i<NF]).
These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 02-11-2014
Wrong solution sorry Smilie
# 6  
Old 02-11-2014
Code:
 perl -pe 's/\"(.+?),(.+?)\"/\"$1$2\"/g'  filename

# 7  
Old 02-11-2014
@Don Cragun: Great explanation, no one could have done it better than this... Thank you so much.

@rk4k: please share your thoughts. Do you mean this approach does not work in all the cases? Because, I find the above code given by in2nix4life works in my case... Please let us know...

@pravin27: thanks for providing a perl code too... however, I assume, it is only substituting the first occurrence of comma and replacing it with a blank space. Your code gives the output as:
Code:
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a  comma","string with or without commas",string 1,USD,12,70%,08/01/2013,
23455,12312255564,"string with, multiple, commas","string with or without commas",string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string  with commas, and - hypens and: semicolans","string with or without commas",string 1,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string with commas, multiple: colans","string with or without commas",string 2,CAND,388,70%,09/21/2013,


Last edited by dhruuv369; 02-11-2014 at 01:38 PM.. Reason: No specific reason
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove pipe(|) symbol in except the ones which are enclosed in double quotes

I have file with are delimited by pipe(|) symbol, I wanted those to be removed except the ones which are enclosed in double quotes. If your quote file is: |Life is |Beautiful"|"Indeed life |is beautiful too|"|"But unix is fun| is not"|" It should return: Life is Beautiful"|"Indeed life is... (9 Replies)
Discussion started by: Sathyapts
9 Replies

2. Shell Programming and Scripting

Remove pipe(|) symbol ina file, except the ones which are enclosed in double quotes

I have file with are delimited by pipe(|) symbol, I wanted those to be removed except the ones which are enclosed in double quotes. If your quote file is: |Life is |Beautiful"|"Indeed life |is beautiful too|"|"But unix is fun| is not"|" It should return: Life is Beautiful"|"Indeed life is... (1 Reply)
Discussion started by: Sathyapts
1 Replies

3. Shell Programming and Scripting

Replace Double quotes within double quotes in a column with space while loading a CSV file

Hi All, I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes. Sample Data : "221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply... (6 Replies)
Discussion started by: mlavanya
6 Replies

4. Shell Programming and Scripting

Shell script that should remove unnecessary commas between double quotes in CSV file

i have data as below 123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi" i need an output to be 123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Discussion started by: mathias23
5 Replies

5. Shell Programming and Scripting

How to match fields surrounded by double quotes with commas?

Hello to all, I'm trying to match only fields surrounded by double quotes that have one or more commas inside. The text is like this "one, t2o",334,"tst,982-0",881,"kmk 9-l","kkd, 115-001, jj-3",5 The matches should be "one, t2o" "tst,982-0" "kkd, 115-001, jj-3" I'm trying with... (11 Replies)
Discussion started by: Ophiuchus
11 Replies

6. Shell Programming and Scripting

How to delete a columns of a CSV file which has cell values with a string enclosed in " , "?

Hi How can I delete a columns from a CSV file which has comma separated value with a string enclosed in double quotes or square bracket and a comma in between? I have a csv file with below format. Template,Target Server,Target Component,Rule Group,Rule,Rule Reference Number,Rule... (7 Replies)
Discussion started by: Litu19
7 Replies

7. Shell Programming and Scripting

How to delete a column/columns of a CSV file which has cell values with a string enclosed in " , "?

How can I delete a column from a CSV file which has comma separated value with a string enclosed in double quotes and a comma in between? I have a file 44.csv with 4 lines including the header like the below format: column1, column2, column3, column 4, column5, column6 12,455,"string with... (6 Replies)
Discussion started by: dhruuv369
6 Replies

8. Shell Programming and Scripting

HELP with AWK or SED. Need to replace the commas between double quotes in CSV file

Hello experts, I need to validate a csv file which contains data like this: Sample.csv "ABCD","I",23,0,9,,"23/12/2012","OK","Street,State, 91135",0 "ABCD","I",23,0,9,,"23/12/2012","OK","Street,State, 91135",0 I just need to check if all the records contain exactly the number of... (5 Replies)
Discussion started by: shell_boy23
5 Replies

9. Shell Programming and Scripting

how to find the count of commas in a string excluding the ones in double quotes

Hi, my requirement is to find the count of commas in a string excluding the ones in double quotes. For example: If the input string is abc,xyz.com,lmhgdf,"abc, 401 street","tty,stt",45,23,45 The output should be 7 (7 Replies)
Discussion started by: amitshete
7 Replies

10. UNIX for Advanced & Expert Users

How to remove a character which is enclosed in Double quotes

I want to remove the comma which is present within the double quoted string. All other commas which is present outside double quotes should be present. Input : a,b,"cc,dd,ee",f,ii,"jj,kk",mmm output : a,b,"ccddee",f,ii,"jjkk",mmm (3 Replies)
Discussion started by: mohan_tuty
3 Replies
Login or Register to Ask a Question