How to match fields surrounded by double quotes with commas?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to match fields surrounded by double quotes with commas?
# 1  
Old 05-20-2014
How to match fields surrounded by double quotes with commas?

Hello to all,

I'm trying to match only fields surrounded by double quotes that have one or more commas inside.

The text is like this
Code:
"one, t2o",334,"tst,982-0",881,"kmk 9-l","kkd, 115-001, jj-3",5

The matches should be
Code:
"one, t2o"
"tst,982-0"
"kkd, 115-001, jj-3"

I'm trying with the regex below, but is matching all fields surrounded by double quotes, even those that
don't have any comma inside.

Code:
".+?"

Thanks in advance for any help
# 2  
Old 05-20-2014
If it allows PCRE
Code:
"[\w]+,[-, \w]*"

Otherwise
Code:
"[a-z0-9]+,[-, a-z0-9]*"

Maybe even this if the content varies from what's shown:
Code:
"[-, a-z0-9]+,[-, a-z0-9]+"

Essentially, I am trying to avoid picking up ",334," and ",881," which are commas inside quotes as well. Smilie

Last edited by Aia; 05-20-2014 at 09:23 PM..
# 3  
Old 05-20-2014
Assuming your text is in a file named file, try:
Code:
awk -F '"' '
{	for(i = 2; i <= NF; i += 2)
		if(index($i, ","))
			printf("\"%s\"\n", $i)
}' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

Last edited by Don Cragun; 05-20-2014 at 09:20 PM.. Reason: Fix auto spell-check fix error.
# 4  
Old 05-20-2014
Hello Aia and Don,
I actually only need the regex and I'm trying with one of yours.
Code:
"[-, a-z0-9]+,[- .a-z0-9]+"

It works for the code I posted, but if I change the "-" in red to "%" the match fails.
I mean change from this:
Code:
"one, t2o",334,"tst,982-0",881,"kmk 9-l","kkd,115-001,jj-3",5

to this
Code:
"one, t2o",334,"tst,982-0",881,"kmk 9","kkd,115-001,jj%3",5,"kkd,"

The matches should be those in red.

I was trying to change your regex to match any field that have anything (letters, numbers, spaces and any other symbols) and at least one comma, doing something like this:
Code:
"[. \w]+,[. \w]+"

But is not working even when the dot is supposed to match almost any symbol.

Thanks again for any help.
# 5  
Old 05-20-2014
Code:
"[\w]+,[-, \w%]*"

Code:
"[-, a-z0-9]+,[-, a-z0-9%]*"

Any of those?

Last edited by Aia; 05-20-2014 at 10:19 PM.. Reason: Made correction to last regex
# 6  
Old 05-21-2014
Quote:
Originally Posted by Ophiuchus
Hello Aia and Don,
I actually only need the regex and I'm trying with one of yours.
Code:
"[-, a-z0-9]+,[- .a-z0-9]+"

It works for the code I posted, but if I change the "-" in red to "%" the match fails.
I mean change from this:
Code:
"one, t2o",334,"tst,982-0",881,"kmk 9-l","kkd,115-001,jj-3",5

to this
Code:
"one, t2o",334,"tst,982-0",881,"kmk 9","kkd,115-001,jj%3",5,"kkd,"

The matches should be those in red.

I was trying to change your regex to match any field that have anything (letters, numbers, spaces and any other symbols) and at least one comma, doing something like this:
Code:
"[. \w]+,[. \w]+"

But is not working even when the dot is supposed to match almost any symbol.

Thanks again for any help.
I'm very confused.

There are lots of different kinds of regular expressions. What utility is going to process the RE that you want? Show us the code you're using where this RE will be plugged in!

What operating system and shell are you using?

Does the input you want to process always start with a quoted string?

Can we safely assume that double-quotes always appear in pairs (i.e., that your input will never contain an escaped double-quote in a double-quoted string)? If not, what is the escape mechanism?
# 7  
Old 05-21-2014
Hello Don/Aia,

Actually I want to use the Regex in a VBA application under Windows, but I'm testing the Regex in an online regex tester (Regex Tester).

The thing is that is a CSV file with comma as field delimiter and since could be commas inside fields, it's used the double quotes as text qualifier in order to avoid confusion between commas that separate fields and commas that are inside the fields. So, each field could contain any text (alphanumeric, spaces, symbols etc) and I want to match only those that have at least one comma. If it is too much complicated then I'd like to match only fields that are surrounded by double quotes.

each line of CSV could be like:
any text,"any text","any, text","any,text",,,2,45,any text

Where any text could a combination of alphanumeric, spaces, symbols etc. Because of that I was trying to match any character doing [.]+,[.]+ but t doesn't work.

Double quotes always appear in pairs since surround the fields (some of them) and there is no escaping character.

Hope make sense.

Regards
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace Double quotes within double quotes in a column with space while loading a CSV file

Hi All, I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes. Sample Data : "221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply... (6 Replies)
Discussion started by: mlavanya
6 Replies

2. Shell Programming and Scripting

Shell script that should remove unnecessary commas between double quotes in CSV file

i have data as below 123,"paul phiri",paul@yahoo.com,"po.box 23, BT","Eco Bank,Blantyre,Malawi" i need an output to be 123,"paul phiri",paul@yahoo.com,"po.box 23 BT","Eco Bank Blantyre Malawi" (5 Replies)
Discussion started by: mathias23
5 Replies

3. Shell Programming and Scripting

How to delete the commas in a .CSV file that are enclosed in a string with double quotes?

Okay, I would like to delete all the commas in a .CSV file (TEST.CSV) or at least substitute them with empty space, that are enclosed in double quote. Please see the sample file as below: column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10... (8 Replies)
Discussion started by: dhruuv369
8 Replies

4. Shell Programming and Scripting

awk print - fields separated with comma's need to ignore inbetween double quotes

I am trying to re-format a .csv file using awk. I have 6 fields in the .csv file. Some of the fields are enclosed in double quotes and contain comma's inside the quotes. awk is breaking this into multiple fields. Sample lines from the .csv file: Device Name,Personnel,Date,Solution... (1 Reply)
Discussion started by: jxrst
1 Replies

5. Shell Programming and Scripting

Preserve commas inside double quotes (perl)

Hi, I have an input file like this $ cat infile hi,i,"am , sam", y hello ,good, morning abcd, " ef, gh " ,ij no, "good,morning", yes, "good , afternoon" from this file I have to split the fields on basis of comma"," however, I the data present inside double qoutes should be treated as... (3 Replies)
Discussion started by: sam05121988
3 Replies

6. Shell Programming and Scripting

HELP with AWK or SED. Need to replace the commas between double quotes in CSV file

Hello experts, I need to validate a csv file which contains data like this: Sample.csv "ABCD","I",23,0,9,,"23/12/2012","OK","Street,State, 91135",0 "ABCD","I",23,0,9,,"23/12/2012","OK","Street,State, 91135",0 I just need to check if all the records contain exactly the number of... (5 Replies)
Discussion started by: shell_boy23
5 Replies

7. Shell Programming and Scripting

Expect scripting - How to match a double quotes " "

I am trying to match a text which contains the " ", from the log file. But it doesn't match. I understand that " " has got a special meaning to TCL/Expect. hence I tried the following, but no luck. expect -ex { "lp -c -demail -ot\\\"firstname_surname@gmail.com\\\"... (3 Replies)
Discussion started by: prakasuj
3 Replies

8. Shell Programming and Scripting

how to find the count of commas in a string excluding the ones in double quotes

Hi, my requirement is to find the count of commas in a string excluding the ones in double quotes. For example: If the input string is abc,xyz.com,lmhgdf,"abc, 401 street","tty,stt",45,23,45 The output should be 7 (7 Replies)
Discussion started by: amitshete
7 Replies

9. Shell Programming and Scripting

Extracting strings surrounded by parentheses and seperate by commas

Excuse the terrible title. I have a text file of 1..n lines, each one containing at least one string between parentheses. Within each string, there is one or more strings separated by commas. I need to extract each string, thus: input file: (THIS,THAT) (THE,OTHER) (THING) (OR,MAYBE)... (6 Replies)
Discussion started by: kpfeif
6 Replies

10. AIX

how to pass variables surrounded in double quotes to awk?

Hi, I'm making progress on this but hung up on one last detail. I'd like to use AWK to pass the system date and time(among other things) to the first line of a file. Here's what I have: BEGIN {TOTALPP = 0;FREEPP=0;USEDPP=0;print "LPAR NAME:",lpar,"DATE:",tdate } I call AWK with the... (4 Replies)
Discussion started by: cruiser
4 Replies
Login or Register to Ask a Question