Replace double quotes with a single quote within a double quoted string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replace double quotes with a single quote within a double quoted string
# 8  
Old 05-02-2014
OK, we have established that these _odd_ quotation marks can be anywhere in any field inside the file.

1) Will there be more than three quotation marks in any one field?
2) Will there be ANY possibility of any other SINGLE quotes in any particular field?
3) Will every field be COMMA separated?
# 9  
Old 05-02-2014
This is based on the assumption that the field separator is a comma and it is immediatly (no spaces etc.) placed beside the "quoting" double quotes, so a double quote without a comma should be replaced:
Code:
awk 'match($0,/[^,]\"[^,]/) {$0=substr($0,1,RSTART) "\047" substr($0,RSTART+2)}1' file
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"

If there's more double quotes to be replaced, you need to repeat the command on that line.
# 10  
Old 05-02-2014
Quote:
Originally Posted by pchang
I hope I can be clear on my explanation.

We are basically receiving a csv file from the vendor and any field they deemed as text, they will enclose with a double quote.

The problem arises when they also have/use a double quote as part of the data.
OK. The following may not be a perfect solution (coming from a less-than-rock-solid definition), but check how far you get with it.

Let us say that quotes you want to preserve are the ones immediately preceeding or following commas (which seem to be the field separators here). In addition there is a single double-quote at the beginning of the line and one at the end of the line. All the other double quotes should become single quotes.

This would work for your example, but there are cases conceivable where this ruleset could be tricked. This is why i suggest you doubly check if it works on your data or if we need to make the ruleset more solid.

Solution: first, all the sequences of "," are replaced by a placeholder (i use "@@", change it to something else if this is used in your data). Also the double-quotes at BOL and EOL are replaced. Then i change the remaining double-quotes to single-quotes and finally transfer the placeholders back.

This sounds complicated, but it makes the regexps necessary a lot easier to handle (and to understand).

Code:
sed "s/^\"/@@/;s/\"$/@@/;s/\",\"/@@/g;s/\",/@@@/g;s/,\"/@@@@/g
     s/\"/\'/g
     s/^@@/\"/;s/@@$/\"/;s/@@@@/,\"/g;s/@@@/\",/g;s/@@/\",\"/g" /path/to/input

I hope this helps.

bakunin
# 11  
Old 05-02-2014
Longhand using __builtins__, OSX 10.7.5, default bash terminal...

This assumes that the VERY first field is not in inverted commas...
Code:
#!/bin/bash
# quote.sh
ifs_str="$IFS"
IFS=","
echo '0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"' > /tmp/text
text=$(cat < /tmp/text)
echo "$text"
quote_array=($text)
field=1
txt="${quote_array[0]}"
while [ $field -lt ${#quote_array[@]} ]
do
	string="${quote_array[$field]}"
	if [ "${string:0:1}" == '"' ] && [ ${#string} -ge 4 ]
	then
		string="${string:1:$((${#string}-2))}"
		string="${string/\"'/'}"
		txt=$txt,\"$string\"
	else
		txt=$txt,$string
	fi
	field=$((field+1))
done
echo "$txt" > /tmp/txt
echo ""
cat < /tmp/txt
IFS="$ifs_str"
exit 0

Results:-
Code:
Last login: Fri May  2 23:22:20 on ttys000
AMIGA:barrywalker~> ./quote1.sh
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"

0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5'
000000111","IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5'
000000111","IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
AMIGA:barrywalker~> _

EDIT:
IGNORE THIS; I noticed a bug and will cure tomorrow...

HTH

Last edited by wisecracker; 05-02-2014 at 08:11 PM.. Reason: See above...
# 12  
Old 05-02-2014
Do you always have two input lines per record?

Are there always 32 (comma separated) fields per record?

If neither of the above are true, how are we supposed to know whether a double-quote at the end of a line is the terminator for the last quoted field on a line (that should be left as is) or an embedded double-quote in the middle of a quoted field (that should be converted to a single-quote)?
# 13  
Old 05-04-2014
Hi.

Like Don Cragun, I noticed that the (too short) sample had 2 lines. I assume they should be a single line, and so I joined them into a file data1.

My approach is to use a code that understands CSV files. Here it is, and following this listing are demonstrations on a short sample, and the supplied sample:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate CSV parsing, replacing, combining with perl module.
# See: perldoc Text/CSV

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
perl -MText::CSV -lne '
BEGIN {
	$csv = Text::CSV->new(
	{ allow_loose_quotes => 1,
	always_quote => 1,
	escape_char        => "\\" });
	$sq = chr(39);
}
chomp;
# Parse, search/replace, combine, write.
if ($csv->parse($_)) {
	@cols = $csv->fields();
	for ($i=0;$i<$#cols;$i++) {
	  $cols[$i] =~ s/["]/$sq/g;
	}
	$csv->combine (@cols);
	print $csv->string();
} else {
	print " Error = ", Text::CSV->error_diag(), "\n";
	die " Parse error at line $.\n";
}
' $FILE

exit 0

producing on a short, readable sample:
Code:
$ ./s1 data2

Environment: LC_ALL = , LANG = en_US.UTF-8
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39

-----
 Input data file data2:
"ABCD2","EFGH2","XXXX","1"
"ABCD2",EFGH2,"XX"XX",2

-----
 Results:
"ABCD2","EFGH2","XXXX","1"
"ABCD2","EFGH2","XX'XX","2"

and on the OP sample:
Code:
$ ./s1 data1

Environment: LC_ALL = , LANG = en_US.UTF-8
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39

-----
 Input data file data1:
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"

-----
 Results:
"0000001111","IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR QCCA","0200","","","WD","CH","","4000320275","","124","124","","60.00","60.00","60.00","0.00","0.45","60.45","0.037500","APP","00","EXC","5"

Best wishes ... cheers, drl

Last edited by drl; 05-05-2014 at 01:53 PM.. Reason: Edit 1: supplied missing global ( "g" ) on the perl substitute statement.
# 14  
Old 05-04-2014
CygWin bash terminal, under Windows Vista...
(It might need a dos2unix conversion first.)
Longhand using __builtins__...
Code:
#!/bin/bash
# quote.sh
> /tmp/text
> /tmp/txt
ifs_str="$IFS"
IFS=","
echo '0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"' > /tmp/text
text=$(cat < /tmp/text)
echo "$text"
quote_array=($text)
n=0
m=0
field=1
string=""
newstring=""
txt="${quote_array[0]}"
while [ $field -lt ${#quote_array[@]} ]
do
	string="${quote_array[$field]}"
	newstring=""
	length=${#string}
	n=0
	m=0
	while [ $n -lt $length ]
	do
		if [ "${string:$n:1}" == '"' ]
		then
			m=$((m+1))
		fi
		n=$((n+1))
	done
	if [ $m -ge 3  ]
	then
		newstring=${string:1:$((${#string}-2))}
		newstring=${newstring/\"/\'}
		string=\"$newstring\"
	fi
	txt=$txt,$string
	field=$((field+1))
done
echo "$txt" > /tmp/txt
echo ""
cat < /tmp/txt
IFS="$ifs_str"
exit 0

Results using 3 copies of the original...
Code:
AMIGA:~> cd /tmp
AMIGA:/tmp> dos2unix quote.sh
dos2unix: converting file quote.sh to Unix format ...
AMIGA:/tmp> ./quote.sh
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D"OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"

0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
0000001111,"IBD","601725","6017257000681563","0430","163458","002820","002820000000","E0107815","1801 3E AVENUE         VAL-D'OR
 QCCA","0200","","","WD","CH","","4000320275","","124","124",,60.00,60.00,60.00,0.00,0.45,60.45,0.037500,"APP","00","EXC","5"
AMIGA:/tmp> _

EDIT:
5th May 2014, 15:25, UK time.
Now tested on OSX 10.7.5, default bash terminal and PCLinuxOS 2009, default bash terminal.
Also tested with extra newlines and quotes in random places.
Also with the line newstring=${newstring/\"/\'} replaced with newstring=${newstring//\"/\'} for multiple
instances of " but not shown here...

Last edited by wisecracker; 05-05-2014 at 11:26 AM.. Reason: See above...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace double quotes inside the string data for all the columns

Please use code tags Hi, I have input data is below format and n of column in the multiple flat files. the string data has any double quotes(") values replaced to double double quotes for all the columns{""). Also, my input flat file each column string data has carriage of new line too.... (14 Replies)
Discussion started by: SSrini
14 Replies

2. Shell Programming and Scripting

Replacing all but the first and last double quote in a line with a single quote with awk

From: 1,2,3,4,5,This is a test 6,7,8,9,0,"This, is a test" 1,9,2,8,3,"This is a ""test""" 4,7,3,1,8,"""" To: 1,2,3,4,5,This is a test 6,7,8,9,0,"This; is a test" 1,9,2,8,3,"This is a ''test''" 4,7,3,1,8,"''"Is there an easy syntax I'm overlooking? There will always be an odd number... (5 Replies)
Discussion started by: Michael Stora
5 Replies

3. Shell Programming and Scripting

Replace Double quotes within double quotes in a column with space while loading a CSV file

Hi All, I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes. Sample Data : "221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply... (6 Replies)
Discussion started by: mlavanya
6 Replies

4. Shell Programming and Scripting

sed command to replace string that contain blackslash,double quotes

Hi All, I have been trying to replace a string using the sed command string value contain blackslash and double quotes. I am not a expert writer of unix script but do try not to ask question. I have almost given up. Hope you all can give me some suggestion I want to replace a place string... (6 Replies)
Discussion started by: thanush9sep
6 Replies

5. Shell Programming and Scripting

Issue with Single Quotes and Double Quotes for prompt PS1

Hi, Trying to change the prompt. I have the following code. export PS1=' <${USER}@`hostname -s`>$ ' The hostname is not displayed <abc@`hostname -s`>$ uname -a AIX xyz 1 6 00F736154C00 <adcwl4h@`hostname -s`>$ If I use double quotes, then the hostname is printed properly but... (3 Replies)
Discussion started by: bobbygsk
3 Replies

6. Shell Programming and Scripting

Replace double double quotes using AWK/SED

Hi, I have data as "01/22/97-"aaaaaaaaaaaaaaaaa""aaa""aabbbbbbbbcccccc""zbcd""dddddddddeeeeeeeeefffffff" I want to remove only the Consequitive double quotes and not the one which occurs single. My O/P must be ... (2 Replies)
Discussion started by: Bhuvaneswari
2 Replies

7. Shell Programming and Scripting

Replace single quote with two single quotes in perl

Hi I want to replace single quote with two single quotes in a perl string. If the string is <It's Simpson's book> It should become <It''s Simpson''s book> (3 Replies)
Discussion started by: DushyantG
3 Replies

8. Shell Programming and Scripting

Regex in grep to match all lines ending with a double quote (") OR a single quote (')

Hi, I've been trying to write a regex to use in egrep (in a shell script) that'll fetch the names of all the files that match a particular pattern. I expect to match the following line in a file: Name = "abc" The regex I'm using to match the same is: egrep -l '(^) *= *" ** *"$' /PATH_TO_SEARCH... (6 Replies)
Discussion started by: NanJ
6 Replies

9. Shell Programming and Scripting

Double quotes or single quotes when using ssh?

I'm not very familiar with the ssh command. When I tried to set a variable and then echo its value on a remote machine via ssh, I found a problem. For example, $ ITSME=itsme $ ssh xxx.xxxx.xxx.xxx "ITSME=itsyou; echo $ITSME" itsme $ ssh xxx.xxxx.xxx.xxx 'ITSME=itsyou; echo $ITSME' itsyou $... (3 Replies)
Discussion started by: password636
3 Replies

10. Shell Programming and Scripting

single or double quote in SED

i m trying the following command but its not working: sed 's/find/\'replace\'/g' myFile but the sed enters into new line # sed 's/find/re\'place/g' myFile > I havn't any idea how to put single quote in my replace string. Your early help woud be appreciated. Thanx (2 Replies)
Discussion started by: asami
2 Replies
Login or Register to Ask a Question