Shell Programming and Scripting

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

awk, comma as field separator and text inside double quotes as a field.


👤 Login to reply

    #1  
Old 11-15-2010
kevintse's Unix or Linux Image
kevintse kevintse is offline
Registered User
 
awk, comma as field separator and text inside double quotes as a field.

Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:
Quote:
aaa,"hell world, test text",bbb,ccc," test text"
for this line, 5 fields are supposed to be extracted, they are:
Quote:
1. aaa
2. "hell world, test text"
3. bbb
4. ccc
5. " test text"
Is there an easy way to achieve this using awk?
Sponsored Links
    #2  
Old 11-15-2010
radoulov's Unix or Linux Image
radoulov radoulov is offline
Registered User
 
If Perl is acceptable:

Code:
perl -MText::ParseWords -nle'
  print ++$c, ". ", $_ 
    for parse_line(",", 1, $_);
  ' infile

Do you need to reset the counter on every row?

---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------

As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/
Sponsored Links
    #3  
Old 11-15-2010
kevintse's Unix or Linux Image
kevintse kevintse is offline
Registered User
 
Quote:
Originally Posted by radoulov View Post
If Perl is acceptable:

Code:
perl -MText::ParseWords -nle'
  print ++$c, ". ", $_ 
    for parse_line(",", 1, $_);
  ' infile

Do you need to reset the counter on every row?

---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------

As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/
Hi, radoulov
Thank you so much, the perl code is neat, but I have to choose to stick with awk for the moment, cause I don't know much about perl, I just want to analyze a simple accesslog file produced by HTTP server.

Thank you for the link, I'll take a look at it.
    #4  
Old 11-15-2010
shamrock shamrock is offline Forum Advisor  
Registered User
 
you can give this awk script a try...
Code:
awk -F, '{
  for (i=1; i<=NF; i++) {
    if (s) {
      if ($i ~ "\"$") {print s","$i; s=""}
      else s = s","$i
    }
    else {
      if ($i ~ "^\".*\"$") print $i
      else if ($i ~ "^\"") s = $i
      else print $i
    }
  }
}' file

Sponsored Links
    #5  
Old 11-15-2010
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
I'd suggest sticking with the CSV parser linked, it deals with a lot of things that come up in CSV files. Like field with imbedded CRs or quotes:

Code:
Test,csv,file,"Multi
line field", rest
Also,some,imbedded,"Quoted ""strings""",can exist

Sponsored Links
    #6  
Old 11-15-2010
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Try:
Code:
sed 's/,\("[^"]*"\)*/\n\1/g'

Code:
$ echo 'aaa,"hell world, test text",bbb,ccc," test text"' | sed 's/,\("[^"]*"\)*/\n\1/g'
aaa
"hell world, test text"
bbb
ccc
" test text"

Sponsored Links
    #7  
Old 11-15-2010
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Quote:
Originally Posted by Scrutinizer View Post
Try:
Code:
sed 's/,\("[^"]*"\)*/\n\1/g'

Nope:
Code:
$ echo '"hello world, test text", aaa, bbb, ccc' | sed 's/,\("[^"]*"\)*/\n\1/g'

"hello world
 test text"
 aaa
 bbb
 ccc

The Following User Says Thank You to Chubler_XL For This Useful Post:
Scrutinizer (11-15-2010)
Sponsored Links
👤 Login to reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk (nawk) field separator gc_sw Shell Programming and Scripting 4 11-03-2010 04:13 AM
Field separator in awk aoussenko Shell Programming and Scripting 2 03-29-2010 12:59 PM
To Replace comma with Pipe inside double quotes prabhutkl Shell Programming and Scripting 3 04-26-2009 10:24 PM
How to set Field Separator for TCLSH??? :S laptop87 Shell Programming and Scripting 2 12-19-2008 10:07 AM
sed removing comma inside double quotes joanneho Shell Programming and Scripting 2 06-30-2008 12:13 AM



All times are GMT -4. The time now is 04:55 AM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password





Not a Forum Member?
Forgot Password?