How to ignore quoted separators


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to ignore quoted separators
# 1  
Old 04-23-2012
Question How to ignore quoted separators

Hi,

I'm trying to parse a text file which uses commas as field separators. Fields are double quoted, and may themselves contain commas, like this:

Code:
"1","John Smith","London","123"
"2","Mary Robertson","Horsham, Sussex","456"

This causes problems for the following command

Code:
cut -d"," -f4

For the first line, it correctly extracts the fourth field ["123"], but for the second, it is fooled by the comma in Mary's adress, and thinks the fourth field is [, Sussex"].

Happy to use awk or any other standard utility available. Please help!

thanks


Moderator's Comments:
Mod Comment Please use code tags, thanks!

Last edited by zaxxon; 04-23-2012 at 12:14 PM.. Reason: code tags
# 2  
Old 04-23-2012
The problem is your field separator is not comma if it's in the field.

At least not a proper one.

One way to get around is to use sed to replace "," with "|" and make pipe your separator (if it's not in the fields of course)

One way would be :
Code:
sed 's#","#"|"#g' input | cut -d "|" -f 4

Hope that helps
Regards
Peasant.
This User Gave Thanks to Peasant For This Post:
# 3  
Old 04-23-2012
Using awk you can set the separator to include the quotes:

Code:
$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{print $3}'
Horsham, Sussex

You'd have to strip the leading/trailing quote tho when accessing the first/last field, respectively.
Code:
$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{print $1}'
"2
$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{gsub(/(^"|"$)/,"");print $1}'
2
$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{gsub(/(^"|"$)/,"");print $4}'
456

This User Gave Thanks to neutronscott For This Post:
# 4  
Old 04-23-2012
With Perl:

Code:
perl -MText::ParseWords -nle'
  print +(parse_line(",",0, $_))[3];
  ' infile

GNU awk 4+:

Code:
awk '{ print $4 }' FPAT='([^,]+)|("[^"]+")' infile

This User Gave Thanks to radoulov For This Post:
# 5  
Old 04-23-2012
Code:
awk -F\" '{print $8}' infile

Or in this case:
Code:
awk -F\" '{print $(NF-1)}' infile

including double quotes:
Code:
awk -F, '{print $NF}' infile

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 04-23-2012
Hi sven44,

One way with perl:
Code:
$ cat infile                                                                                                                                                                                                                                 
"1","John Smith","London","123"                                                                                                                                                                                                              
"2","Mary Robertson","Horsham, Sussex","456"                                                                                                                                                                                                 
$ perl -MText::CSV_XS -e 'while ( $colref = Text::CSV_XS->new->getline( ARGV ) ) {  printf qq[%s\n], $colref->[-1] }' infile                                                                                                               
123                                                                                                                                                                                                                                          
456

This User Gave Thanks to birei For This Post:
# 7  
Old 04-23-2012
What about considering the quotes as the delimiter

Code:
$ awk -F'"' '{print $8}' sample4.txt
123
456


$ awk -F'"' '{print $(NF-1)}' sample4.txt
123
456

This User Gave Thanks to joeyg For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Exclude dash (-) from word separators in vi

vi uses dash and space as word separators. is there any way to exclude dash from word separators ? This is required to work with the symbols generated by ctags exe. when symbol contain a "-" ,vi tags fails to locate that even though symbol is generated properly. For example Symbol -... (3 Replies)
Discussion started by: cabhi
3 Replies

2. UNIX for Dummies Questions & Answers

grep quoted numbers from lines

I want to parse the lines and want to extract the double quoted numbers as: "SQL3149N "72" rows were processed from the input file. "0" rows were successfully inserted into the table. "0" rows were rejected." and want the output in 3 variables like a=72 b=0 c=0 thanks in advance ... (3 Replies)
Discussion started by: mahesh_191
3 Replies

3. UNIX for Dummies Questions & Answers

Can one use 2 field separators in awk?

I have files such as n02-z30-dsr65-terr0.25-dc0.008-16x12drw-run1.cmd I am wondering if it is possible to define two field separators "-" and "." for these strings so that $7 is run1. (5 Replies)
Discussion started by: kristinu
5 Replies

4. UNIX for Dummies Questions & Answers

Count Fields with Quoted Field

Hi, I used to count number of fields using following command head -1 <filename> | awk -F"," '{print NF}' Now the scenario is the delimiter(comma) occurs inside one of the data field. How to ignore the comma inside data and consider only delimiter and count number of fields. The fields are... (1 Reply)
Discussion started by: ethanr100
1 Replies

5. Shell Programming and Scripting

Take quoted output from one script as quoted input for another script

Hi, I have a script output.sh which produces the following output (as an example): "abc def" "ghi jkl" This output should be handled from script input.sh as input and the quotes should be treated as variable delimiters but not as regular characters. input.sh (processing positional... (2 Replies)
Discussion started by: stresing
2 Replies

6. Shell Programming and Scripting

how to use two separators in awk.

Hi, Gurus, I have a file like 1 234, 345, 456 2 345, 456, 345 I want to use awk with multipe separator ( one is comma, another is space)print out $1, $3 which should be: 1, 345 2, 456 but I don't know how to put space as separator with another separator. Thanks in advance (7 Replies)
Discussion started by: ken002
7 Replies

7. Shell Programming and Scripting

Unterminated quoted string

Hello! I wroted a little script that should check for new updates on a server and get them if any. The problem is, every time I run it with sh, I'm getting an "script: 20: Syntax error: Unterminated quoted string" error! The problem is, there isn't any "unterminated quoted string" in my script:... (2 Replies)
Discussion started by: al0x
2 Replies

8. Shell Programming and Scripting

AWK multiple fields separators

I need to print the second field of a file, taking spaces, tab and = as field separators. ; for 16-bit app support MAPI=1 CMC=1 CMCDLLNAME32=mapi32.dll CMCDLLNAME=mapi.dll MAPIX=1 MAPIXVER=1.0.0.1 OLEMessaging=1 asf=MPEGVideo asx=MPEGVideo ivf=MPEGVideo m3u=MPEGVideo (2 Replies)
Discussion started by: PamPam
2 Replies

9. Shell Programming and Scripting

want to remove separators from file

Hi, I have huge file, head -1 filneame gives, I just want to remove "##colsep##" from the file, and also want to count the no. of fileds present, as in Output shld be in newfile as TRADE_KEY,TRADE_DATE and total no. of fileds separated by these comma's ... (7 Replies)
Discussion started by: niceboykunal123
7 Replies

10. Shell Programming and Scripting

awk search for Quoted strings (')

Hi All, I have files: 1. abc.sql 'This is a sample file for testing' This does not have quotations this also does not have quotations. and this 'has quotations'. here I need to list the hard coded strings 'This is a sample file for testing' and 'has quotations'. So i have... (13 Replies)
Discussion started by: kprattip
13 Replies
Login or Register to Ask a Question