Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

awk, comma as field separator and text inside double quotes as a field.

Shell Programming and Scripting


Tags
awk, csv parsing, field separator

Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
kevintse's Unix or Linux Image
kevintse kevintse is offline
Registered User
 
Join Date: May 2010
Last Activity: 24 May 2013, 10:11 PM EDT
Location: GuangZhou, China
Posts: 248
Thanks: 8
Thanked 28 Times in 27 Posts
awk, comma as field separator and text inside double quotes as a field.

Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:
Quote:
aaa,"hell world, test text",bbb,ccc," test text"
for this line, 5 fields are supposed to be extracted, they are:
Quote:
1. aaa
2. "hell world, test text"
3. bbb
4. ccc
5. " test text"
Is there an easy way to achieve this using awk?
Sponsored Links
    #2  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
radoulov's Unix or Linux Image
radoulov radoulov is offline
Registered User
 
Join Date: Jan 2007
Last Activity: 9 January 2017, 4:40 AM EST
Location: Варна, България / Milano, Italia
Posts: 5,690
Thanks: 184
Thanked 630 Times in 587 Posts
If Perl is acceptable:


Code:
perl -MText::ParseWords -nle'
  print ++$c, ". ", $_ 
    for parse_line(",", 1, $_);
  ' infile

Do you need to reset the counter on every row?

---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------

As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/
Sponsored Links
    #3  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
kevintse's Unix or Linux Image
kevintse kevintse is offline
Registered User
 
Join Date: May 2010
Last Activity: 24 May 2013, 10:11 PM EDT
Location: GuangZhou, China
Posts: 248
Thanks: 8
Thanked 28 Times in 27 Posts
Quote:
Originally Posted by radoulov View Post
If Perl is acceptable:


Code:
perl -MText::ParseWords -nle'
  print ++$c, ". ", $_ 
    for parse_line(",", 1, $_);
  ' infile

Do you need to reset the counter on every row?

---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------

As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/
Hi, radoulov
Thank you so much, the perl code is neat, but I have to choose to stick with awk for the moment, cause I don't know much about perl, I just want to analyze a simple accesslog file produced by HTTP server.

Thank you for the link, I'll take a look at it.
    #4  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
shamrock's Unix or Linux Image
shamrock shamrock is offline Forum Advisor  
Registered User
 
Join Date: Oct 2007
Last Activity: 21 August 2017, 1:53 PM EDT
Location: USA
Posts: 1,599
Thanks: 37
Thanked 158 Times in 148 Posts
you can give this awk script a try...

Code:
awk -F, '{
  for (i=1; i<=NF; i++) {
    if (s) {
      if ($i ~ "\"$") {print s","$i; s=""}
      else s = s","$i
    }
    else {
      if ($i ~ "^\".*\"$") print $i
      else if ($i ~ "^\"") s = $i
      else print $i
    }
  }
}' file

Sponsored Links
    #5  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 7 December 2017, 5:57 PM EST
Posts: 3,424
Thanks: 147
Thanked 1,212 Times in 1,112 Posts
I'd suggest sticking with the CSV parser linked, it deals with a lot of things that come up in CSV files. Like field with imbedded CRs or quotes:


Code:
Test,csv,file,"Multi
line field", rest
Also,some,imbedded,"Quoted ""strings""",can exist

Sponsored Links
    #6  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 17 December 2017, 8:32 AM EST
Location: Amsterdam
Posts: 11,660
Thanks: 521
Thanked 3,386 Times in 2,985 Posts
Try:

Code:
sed 's/,\("[^"]*"\)*/\n\1/g'


Code:
$ echo 'aaa,"hell world, test text",bbb,ccc," test text"' | sed 's/,\("[^"]*"\)*/\n\1/g'
aaa
"hell world, test text"
bbb
ccc
" test text"

Sponsored Links
    #7  
Old Unix and Linux 11-15-2010   -   Original Discussion by kevintse
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 7 December 2017, 5:57 PM EST
Posts: 3,424
Thanks: 147
Thanked 1,212 Times in 1,112 Posts
Quote:
Originally Posted by Scrutinizer View Post
Try:

Code:
sed 's/,\("[^"]*"\)*/\n\1/g'
Nope:

Code:
$ echo '"hello world, test text", aaa, bbb, ccc' | sed 's/,\("[^"]*"\)*/\n\1/g'

"hello world
 test text"
 aaa
 bbb
 ccc

The Following User Says Thank You to Chubler_XL For This Useful Post:
Scrutinizer (11-15-2010)
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk (nawk) field separator gc_sw Shell Programming and Scripting 4 11-03-2010 05:13 AM
Field separator in awk aoussenko Shell Programming and Scripting 2 03-29-2010 01:59 PM
To Replace comma with Pipe inside double quotes prabhutkl Shell Programming and Scripting 3 04-26-2009 11:24 PM
How to set Field Separator for TCLSH??? :S laptop87 Shell Programming and Scripting 2 12-19-2008 11:07 AM
sed removing comma inside double quotes joanneho Shell Programming and Scripting 2 06-30-2008 01:13 AM



All times are GMT -4. The time now is 09:27 PM.