Sponsored Content
Top Forums Shell Programming and Scripting AWK removing away needed double quotes. Post 302314100 by pinnacle on Thursday 7th of May 2009 11:13:25 AM
Old 05-07-2009
AWK removing away needed double quotes.

The below code is to convert csv file to pipe delimited.
It replaces comma with pipe if it is not in double quotes; If comma is in double quotes it doesnot replace the comma with a pipe.
The code works fine except it eat away the double quotes in the output file.

Quote:
Input file:
filed1,field2,field3
ABC,"hello,world",time
ABC,"hello,america",time1
XYZ,"god,bless",time2
INJ,are,youthere

Quote:
Output i am getting:
filed1|field2|field3
ABC|hello,world|time
ABC|hello,america|time1
XYZ|god,bless|time2
INJ|are|youthere
Quote:
Output needed:
filed1|field2|field3
ABC|"hello,world"|time
ABC|"hello,america"|time1
XYZ|"god,bless"|time2
INJ|are|youthere

Code:
BEGIN { FS=SUBSEP; OFS="|" }
{
  result = setcsv($0, ",")
  print
}
# setcsv(str, sep) - parse CSV (MS specification) input
# str, the string to be parsed. (Most likely $0.)
# sep, the separator between the values.
#
# After a call to setcsv the parsed fields are found in $1 to $NF.
# setcsv returns 1 on sucess and 0 on failure.
#
# By Peter Strvmberg aka PEZ.
# Based on setcsv by Adrian Davis. Modified to handle a separator
# of choice and embedded newlines. The basic approach is to take the
# burden off of the regular expression matching by replacing ambigious
# characters with characters unlikely to be found in the input. For
# this the characters "\035".
#
# Note 1. Prior to calling setcsv you must set FS to a character which
#         can never be found in the input. (Consider SUBSEP.)
# Note 2. If setcsv can't find the closing double quote for the string
#         in str it will consume the next line of input by calling
#         getline and call itself until it finds the closing double
#         qoute or no more input is available (considered a failiure).
# Note 3. Only the "" representation of a literal quote is supported.
# Note 4. setcsv will probably missbehave if sep used as a regular
#         expression can match anything else than a call to index()
#         would match.
#
function setcsv(str, sep, i) {
  gsub(/""/, "\035", str)
  gsub(sep, FS, str)
  while (match(str, /"[^"]*"/)) {
    middle = substr(str, RSTART+1, RLENGTH-2)
    gsub(FS, sep, middle)
    str = sprintf("%.*s%s%s", RSTART-1, str, middle,
      substr(str, RSTART+RLENGTH))
  }
  if (index(str, "\"")) {
    return ((getline) > 0) ? setcsv(str (RT != "" ? RT : RS) $0, sep) : !setcsv(str "\"", sep)
  } else {
    gsub(/\035/, "\"", str)
    $0 = str
    for (i = 1; i <= NF; i++)
      if (match($i, /^"+$/))
        $i = substr($i, 2)
    $1 = $1 ""
    return 1
  }
}

Appreciate help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed removing comma inside double quotes

I have a csv file with lines like the followings 123456,"ABC CO., LTD","XXX" 789012,"DEF LIMITED", "XXX" before I bcp this file to database, the comma in "CO.," need to be removed first. My script is cat <filename> | sed 's/"CO.,"/"CO."/g' but it doesn't work. Can anyone here able to... (2 Replies)
Discussion started by: joanneho
2 Replies

2. UNIX for Dummies Questions & Answers

Removing double quotes in a file

Hi All, I have a tab delimited file where each of the strings have double quotes. The problem is that I have records which are in the following format: "TEXAS" ""HOUSTON"" "123" "" "2625-39-39" ""MAINE"" "" "456" "I" "3737-39-82" I would have to output... (3 Replies)
Discussion started by: kingofprussia
3 Replies

3. Shell Programming and Scripting

removing extra double quotes between pipe dilimeter

I have a flat file sample like this - "COURSE"|"ddddd " " dddd"|"sssddd sdsdsdsdx" dddddddd ffffff "aaaaa" dddddddd ffffff sdsdsd"|"xxxxxxx"| "COURSE"|"ffff " " bbbb"|"lllll"| The delimiter is pipe character (|) and the text are enclosed in double quotes... (5 Replies)
Discussion started by: vishalzone
5 Replies

4. Shell Programming and Scripting

Using double quotes in awk

Hi I read somewhere that when using double quotes in awk; variables gets expanded else it doesn't. So I tried to use the double quotes inside an awk statement as below: from_instance_trans=`awk "/INPUT =\"$frm_inst\"/,/<\/TRANSFORMATION>/" $xml_object | grep -w "<TRANSFIELD" | awk... (9 Replies)
Discussion started by: dips_ag
9 Replies

5. Shell Programming and Scripting

Replace double double quotes using AWK/SED

Hi, I have data as "01/22/97-"aaaaaaaaaaaaaaaaa""aaa""aabbbbbbbbcccccc""zbcd""dddddddddeeeeeeeeefffffff" I want to remove only the Consequitive double quotes and not the one which occurs single. My O/P must be ... (2 Replies)
Discussion started by: Bhuvaneswari
2 Replies

6. UNIX for Dummies Questions & Answers

Help populating double quotes using awk

Want to populate double quotes for each filed using awk: Input: cat file.txt => "1-23-test_test1-test2" Required output : "1-23-test_test1-test2"|"#GT_properties_xyz" Was trying the below command on solaris 9 machine : awk -F"|" '{print $1"|""#GT_properties_xyz"}' file.txt ... (8 Replies)
Discussion started by: rajachandhok
8 Replies

7. Shell Programming and Scripting

Replace Double quotes within double quotes in a column with space while loading a CSV file

Hi All, I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes. Sample Data : "221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply... (6 Replies)
Discussion started by: mlavanya
6 Replies

8. Shell Programming and Scripting

Removing consecutive double quotes

Hi I have a .csv file and when opened in notepad looks like this gggg,nnnn,"last,first","llll""",nnn So, Here I would like the ouput as below gggg,nnnn,"last,first","llll",nnn i.e replace all two double quotes into one. How could I do that? This file is being processed by another... (5 Replies)
Discussion started by: dnat
5 Replies

9. Shell Programming and Scripting

awk to parse comma separated field and removing comma in between number and double quotes

Hi Experts, Please support I have below data in file in comma seperated, but 4th column is containing comma in between numbers, bcz of which when i tried to parse the file the column 6th value(5049641141) is being removed from the file and value(222.82) in column 5 becoming value of column6. ... (3 Replies)
Discussion started by: as7951
3 Replies

10. Shell Programming and Scripting

awk -File contain double quotes

Hi ALL, file data like : test.csv a,b,"c,d" my awk version is 4.0.2 ,if i am using the below code is working fine. awk -vFPAT='(*)|("+")' -vOFS="," '{print $3}' test.csv if the awk version is 3.1.7 is not working . Could you please help me on this one. output should be : "c,d" (6 Replies)
Discussion started by: bmk123
6 Replies
STRTOK(3)						   BSD Library Functions Manual 						 STRTOK(3)

NAME
strtok, strtok_r -- string tokens LIBRARY
Standard C Library (libc, -lc) SYNOPSIS
#include <string.h> char * strtok(char *str, const char *sep); char * strtok_r(char *str, const char *sep, char **last); DESCRIPTION
This interface is obsoleted by strsep(3). The strtok() function is used to isolate sequential tokens in a null-terminated string, str. These tokens are separated in the string by at least one of the characters in sep. The first time that strtok() is called, str should be specified; subsequent calls, wishing to obtain further tokens from the same string, should pass a null pointer instead. The separator string, sep, must be supplied each time, and may change between calls. The implementation will behave as if no library function calls strtok(). The strtok_r() function is a reentrant version of strtok(). The context pointer last must be provided on each call. The strtok_r() function may also be used to nest two parsing loops within one another, as long as separate context pointers are used. The strtok() and strtok_r() functions return a pointer to the beginning of each subsequent token in the string, after replacing the token itself with a NUL character. When no more tokens remain, a null pointer is returned. EXAMPLES
The following uses strtok_r() to parse two strings using separate contexts: char test[80], blah[80]; char *sep = "\/:;=-"; char *word, *phrase, *brkt, *brkb; strcpy(test, "This;is.a:test:of=the/string\tokenizer-function."); for (word = strtok_r(test, sep, &brkt); word; word = strtok_r(NULL, sep, &brkt)) { strcpy(blah, "blah:blat:blab:blag"); for (phrase = strtok_r(blah, sep, &brkb); phrase; phrase = strtok_r(NULL, sep, &brkb)) { printf("So far we're at %s:%s ", word, phrase); } } SEE ALSO
memchr(3), strchr(3), strcspn(3), strpbrk(3), strrchr(3), strsep(3), strspn(3), strstr(3), wcstok(3) STANDARDS
The strtok() function conforms to ISO/IEC 9899:1990 (``ISO C90''). AUTHORS
Wes Peters <wes@softweyr.com>, Softweyr LLC Based on the FreeBSD 3.0 implementation. BUGS
The System V strtok(), if handed a string containing only delimiter characters, will not alter the next starting point, so that a call to strtok() with a different (or empty) delimiter string may return a non-NULL value. Since this implementation always alters the next starting point, such a sequence of calls would always return NULL. BSD
November 27, 1998 BSD
All times are GMT -4. The time now is 05:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy