Replacing all but the first and last double quote in a line with a single quote with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing all but the first and last double quote in a line with a single quote with awk
# 1  
Old 05-21-2015
Replacing all but the first and last double quote in a line with a single quote with awk

From:
Code:
1,2,3,4,5,This is a test
6,7,8,9,0,"This, is a test"
1,9,2,8,3,"This is a ""test"""
4,7,3,1,8,""""

To:

Code:
1,2,3,4,5,This is a test
6,7,8,9,0,"This; is a test"
1,9,2,8,3,"This is a ''test''"
4,7,3,1,8,"''"

Is there an easy syntax I'm overlooking? There will always be an odd number of quotes (0,2,4,6,etc). If present the first quote will follow a comma and if present the final quote will precede a LF.

Prior to this, I have replaced any LF (but not CR+LF) with |, than replaced any CR+LF with LF and replaced any , in the rightmst field with ; (by using a loop to look for more csv columns and OFS=;). I was not aware of the "" mean " thing in the .csv standard however.

Mike

Last edited by Michael Stora; 05-21-2015 at 04:14 PM..
# 2  
Old 05-21-2015
Try:
Code:
awk '{gsub(/"/,q,$NF); gsub("^" q "|" q "$","\"",$NF)}1' FS=, OFS=, q=\'  file

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 05-21-2015
Quote:
Originally Posted by Scrutinizer
Try:
Code:
awk '{gsub(/"/,q,$NF); gsub("^" q "|" q "$","\"",$NF)}1' FS=, OFS=, q=\'  file

I think I understand the code but what does the 1do? Also why is the or in quotes?

Unfortunately I often have single quotes at the end of a line, so this often fails. The following example is based on actual data:
Code:
echo "1,2,3,4,5,retested after moving sample 8'" | awk '{gsub(/"/,q,$NF); gsub("^" q "|" q "$","\"",$NF)}1' FS=, OFS=, q=\'
1,2,3,4,5,retested after moving sample 8"

I also think this would break with commas inside quotes (as I initially posed the question before editing). Fortunately I already replaced them with semicolons.

Mike

PS. kicked myself many times for not using .tsb instead of .csv at the beginning of this project . . .

Last edited by Michael Stora; 05-21-2015 at 07:01 PM..
# 4  
Old 05-22-2015
Another way of doing this is like this, which is perhaps a bit clearer:
Code:
awk '{gsub(/"/,q,$NF); gsub(r,"\"",$NF)}1' FS=, OFS=, q=\' r="^'|'$" file

Which replaces all the double quotes with single quotes and then changes the first and the last single quote in the last field back to double quotes..
The 1 means: the condition is true, no action was specified , so perform the default action, which is {print $0}

If you have single quotes in you input, then a third, intermediate character is needed that is not in your input. For this we can use any character that is not in your input. This example uses a newline character (which is equal to RS), which cannot be in the input, since awk is reading line by line and strips the newline.

Code:
awk '{gsub(/"/,RS,$NF); gsub(r,"\"",$NF); gsub(RS,q,$NF)}1' FS=, OFS=, q=\' r="^\n|\n$" file

With comma's inside double quotes this becomes more complicated, since you would need to combine with the earlier solution..

--
Edit: only just noted that in you original example comma's inside double quotes need to be converted to semicolons, but you already changed them, like you said..

Last edited by Scrutinizer; 05-22-2015 at 01:40 AM..
# 5  
Old 05-22-2015
I got around using an intermediate character (in the past I have used some of the old ASCII punch card/paper tape control characters 28-32) by brute forcing first and last quote removal. There should be no need to exclude a character now.

Code:
awk -F, 'BEGIN { OFS=","; comm=22}                        #Convert any "," in comments to ";"
             { for(i=comm+1; i<=NF; i++) {$comm=$comm";"$i}
               if (NF>comm) NF=comm; print $0 }' |
    awk -F, 'BEGIN { OFS = ","
                     # Read in Known Fail List
                     getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline < "'"$failListFile"'" # Header Rows
                     while (getline < "'"$failListFile"'") { split( $0, a, ","); i=a[1]a[2]a[3]; gsub ( " ", "", i ); failMessage[i]=a[6]
                     failStart[i]=a[4]? a[4] : "0000 01 01 00 00 00"
                     failEnd[i]=a[5]? a[5] : "9999 12 31 23 59 59" }
                     close("'"$failListFile"'")}
             !($7 == "" || $9 == "" || $10 == "" || $11 == "" || $12 == "") {
                split($7,a," "); split(a[1],d,"/"); split (a[2],t,":")
                month = sprintf("%02d",d[1]); day = sprintf("%02d",d[2])    #All 2 digits
                year = 2000 + d[3] % 100                                    #Force 4 digit year
                hour = sprintf("%02d",t[1]); min = sprintf("%02d",t[2])
                date = month"/"day"/"year; time = hour":"min
                $7 = date" "time
                if ( $19 == "Y" ) $19 = "V";        #Allowing for older Raw Data Files and Archives
                else if ( $19 == "N" ) $19 = "I";   #to use the older YNE vs VIE Valid column.
                else if ( $19 =="" ) $19 = "I"      #if valid column manually erased, treat as Invalid
           #     $22 = $22 ~ /^\".*\"$/ ? $22 : "\""$22"\"" # put quotes around comment if Excel did not already
                gsub(/^\"/,"",$22); gsub(/\"$/,"",$22); gsub (/\"\"/,"\x27\x27", $22); $22 = "\""$22"\"" #remove wrapping quotes, Change "" (.csv representation of ") to '', rewrap in quotes
                i=$1$6$8; gsub ( " ", "", i )
                if ( $18 == "FAIL" && i in failMessage ) { now = mktime(year" "month" "day" "hour" "min" 00")
                    if ( now >= mktime(failStart[i]) && now <= mktime(failEnd[i]) ) {
                        if ($22 == "\"\"") gsub ( "\"$", "Known Fail: "failMessage[i]"\"", $22 )
                        else gsub ( "\"$", "|Known Fail: "failMessage[i]"\"", $22 ) } }
                print $0
             }'

Took me a long time to figure out that you could not escape ' characters in AWK with \' or a whole bunch of other things with and without a variable declaration (but you can do so in BASH with the awk -v option), so I used \x27

Excel column 22:
Code:
Fake data: comment with "quotes"
Fake data, comment, with, commas,,,

.CSV column 22:
Code:
"Fake data: comment with ""quotes"""
"Fake data, comment, with, commas,,,"

Output of script column 22:
Code:
"Fake data: comment with ''quotes''"
"Fake data, comment, with, commas,,,"

Mike

Last edited by Michael Stora; 05-22-2015 at 06:43 PM..
# 6  
Old 05-23-2015
You can do that with awk, it is not an awk thing, it is a shell thing.. In shell you cannot escape characters that are in single quotes. And since the actual awk script is enclosed in single quotes...

What you can do is have a file that contains the awk script and call it like this:

Code:
awk -f awk.script

Then you do not need to worry about escaping quotes...

For example:
Code:
$ awk 'BEGIN{print "This is a quote: '\''"}' 
This is a quote: '

$ cat awk.script
BEGIN {
  print "This is a quote: '"
} 

$ awk -f awk.script
This is a quote: '


Last edited by Scrutinizer; 05-23-2015 at 04:41 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: single quote match in If

Hello, I'd like to print line if column 5th doesn't match with exm. But to reach there I have to make sure I match single quote. I'm struggling to match that. I've input file like: Warning: Variants 'exm480340' and '5:137534453:G:C' have the same position. Warning: Variants 'exm480345'... (9 Replies)
Discussion started by: genome
9 Replies

2. Shell Programming and Scripting

Replacing Double Quote in Double Quote incsv file

Hi All , We have source data file as csv file and since data could contain commas ,each attribute is quoted into double quotes.However problem is that some of the attributa data also contain double quotes which is converted to double double quote while creating csv file XLs data : ... (2 Replies)
Discussion started by: Shalini Badal
2 Replies

3. Shell Programming and Scripting

Replace double quotes with a single quote within a double quoted string

Hi Froum. I have tried in vain to find a solution for this problem - I'm trying to replace any double quotes within a quoted string with a single quote, leaving everything else as is. I have the following data: Before: ... (32 Replies)
Discussion started by: pchang
32 Replies

4. Shell Programming and Scripting

Replacing trailing space with single quote

Platform : RHEL 5.8 I want to end each line of this file with a single quote. $ cat hello.txt blueskies minnie mickey gravity snoopyAt VI editor's command mode, I have used the following command to replace the last character with a single quote. ~ ~ ~ :%s/$/'/gNow, the lines in the... (10 Replies)
Discussion started by: John K
10 Replies

5. Shell Programming and Scripting

replacing a quote in some lines with multiple quote fields

i want to replace mistaken quotes in line starting with tag 300 and relocate the quote in the correct position so the input is 223;25 224;20100428064823;1;0;0;0;0;0;0;0;8;1;3;9697;18744;;;;;;;;;;;; 300;X;Event:... (3 Replies)
Discussion started by: wradwan
3 Replies

6. Shell Programming and Scripting

Replacing the string after certain # of double quote

Could you please help in unix scripting for below scenario... In my input file, there might be a chance of having a string ( Ex:"99999") after 5th double quote for each record. I need to replace it with a space. Ex : Input : "abcdef","12345","99999","0986"... (3 Replies)
Discussion started by: vsairam
3 Replies

7. Shell Programming and Scripting

Regex in grep to match all lines ending with a double quote (") OR a single quote (')

Hi, I've been trying to write a regex to use in egrep (in a shell script) that'll fetch the names of all the files that match a particular pattern. I expect to match the following line in a file: Name = "abc" The regex I'm using to match the same is: egrep -l '(^) *= *" ** *"$' /PATH_TO_SEARCH... (6 Replies)
Discussion started by: NanJ
6 Replies

8. UNIX for Dummies Questions & Answers

how to print single quote in awk

Hi all, It is a very stupid problem but I am not able to find a solution to it. I am using awk to get a column from a file and I want to get the output field in between single quotes. For example, Input.txt 123 abc 321 ddff 433 dfg ........ I want output file to be as ... (6 Replies)
Discussion started by: gauravgoel
6 Replies

9. Shell Programming and Scripting

single or double quote in SED

i m trying the following command but its not working: sed 's/find/\'replace\'/g' myFile but the sed enters into new line # sed 's/find/re\'place/g' myFile > I havn't any idea how to put single quote in my replace string. Your early help woud be appreciated. Thanx (2 Replies)
Discussion started by: asami
2 Replies

10. Shell Programming and Scripting

Replacing a single quote

Hi there I have a data file like so below 'A/1';'T100002';'T100002';'';'01/05/2004';'31/05/2004';'01/06/2004';'08/06/2004';'1.36';'16';'0.22';'0';'0';'1.58';'0';'0';'0';'0';'0';'0';'clientes\resumen\200405\resumen_T100002_T100002_1.pdf';'';'0001';'S';'20040501';'';'02';'0';'S';'N'... (3 Replies)
Discussion started by: rjsha1
3 Replies
Login or Register to Ask a Question