awk to parse comma separated field and removing comma in between number and double quotes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to parse comma separated field and removing comma in between number and double quotes
# 1  
Old 04-13-2018
awk to parse comma separated field and removing comma in between number and double quotes

Hi Experts,

Please support
I have below data in file in comma seperated, but 4th column is containing comma in between numbers, bcz of which when i tried to parse the file the column 6th value(5049641141) is being removed from the file and value(222.82) in column 5 becoming value of column6.

Below is command used

Code:
awk -F"," -v OFS="|" '{gsub(/\,/,"",$4);gsub(/\"/,"");gsub(/^[[:space:]]+|[[:space:]]+$/,"",$4);gsub(/^[[:space:]]+|[[:space:]]+$/,"",$5);print $1, $2, $3, $4, $5, $6}' input.csv > output.csv

Input file
Code:
Bill.Doc.,Item,Exch.Rate,    Net value, Tax amount,Sales Doc.
6400392211,10,1," 1,271.19 ", 228.82 ,5049641141
6400392212,10,1, 635.59 , 114.40 ,5049641143
6400392213,10,1, 635.59 , 114.40 ,5049641145
6400392214,10,1," 1,271.19 ", 228.82 ,5049641147

output file(from command mentioned above)
Code:
Bill.Doc. |Item|Exch.Rate|Net value|Tax amount|Sales Doc.
6400392211|10|1|1|271.19|228.82
6400392212|10|1|635.59|114.40|5049641143
6400392213|10|1|635.59|114.40|5049641145
6400392214|10|1|1|271.19|228.82
6400392215|10|1|0.01|-|5049641149
6400392216|10|1|4.23|0.76|5049641151

Expected ouptut
Code:
Bill.Doc. |Item|Exch.Rate|Net value|Tax amount|Sales Doc.
6400392211|10|1|1271.19|228.82|5049641141
6400392212|10|1|635.59|114.40|5049641143
6400392213|10|1|635.59|114.40|5049641145
6400392214|10|1|1271.19|228.82|5049641147
6400392215|10|1|0.01|-|5049641149
6400392216|10|1|4.23|0.76|5049641151

# 2  
Old 04-13-2018
There are several methods to do what you need, many of them have been published in these fora.

Try
Code:
awk -F, -vOFS="|" '
match ($0, /"[^"]*"/)   {T1 = T2 = substr ($0, RSTART, RLENGTH)
                         gsub (/[ ",]/, _, T2)
                         sub (T1, T2)
                        }
                        {$1 = $1
                        }
1
' file
Bill.Doc.|Item|Exch.Rate|    Net value| Tax amount|Sales Doc.
6400392211|10|1|1271.19| 228.82 |5049641141
6400392212|10|1| 635.59 | 114.40 |5049641143
6400392213|10|1| 635.59 | 114.40 |5049641145
6400392214|10|1|1271.19| 228.82 |5049641147

# 3  
Old 04-13-2018
Or,
Code:
awk -F\" '
                {for (i=2; i<= NF; i+=2) gsub (",", _, $i)
                 gsub (" *\|*,\|* *", "|")
                }
1
' file

This User Gave Thanks to RudiC For This Post:
# 4  
Old 04-14-2018
Hi.

There are a number of software collections and applications that know how to deal with CSV (sometimes TSV, and more generally DSV, Delimiter Separated Files) Many allow the delimiter to be specified in case it is not a comma or a TAB.

Here is one, csv2tsv, that simply replaces the input separator with the desired output separator, each of which can be specified to be other that the defaults, comma -> TAB. This snippet:
Code:
# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }

pl " Input data file $FILE:"
head $FILE

pl " Expected output:"
cat $E

pl " Results:"
csv2tsv -c "," -t '|' $FILE |
sed '2,$s/[ ,]//g'

producing:
Code:
-----
 Input data file data1:
Bill.Doc.,Item,Exch.Rate,    Net value, Tax amount,Sales Doc.
6400392211,10,1," 1,271.19 ", 228.82 ,5049641141
6400392212,10,1, 635.59 , 114.40 ,5049641143
6400392213,10,1, 635.59 , 114.40 ,5049641145
6400392214,10,1," 1,271.19 ", 228.82 ,5049641147

-----
 Expected output:
Bill.Doc. |Item|Exch.Rate|Net value|Tax amount|Sales Doc.
6400392211|10|1|1271.19|228.82|5049641141
6400392212|10|1|635.59|114.40|5049641143
6400392213|10|1|635.59|114.40|5049641145
6400392214|10|1|1271.19|228.82|5049641147

-----
 Results:
Bill.Doc.|Item|Exch.Rate|    Net value| Tax amount|Sales Doc.
6400392211|10|1|1271.19|228.82|5049641141
6400392212|10|1|635.59|114.40|5049641143
6400392213|10|1|635.59|114.40|5049641145
6400392214|10|1|1271.19|228.82|5049641147

The sed just removes the extraneous commas and spaces from the numeric data. You can modify the header as desired -- I could not see how the spacing there was supposed to be handled.

Here are some details about the code:
Code:
csv2tsv Convert Comma-separated file to tab-separated file (tsv). (doc)
Path    : ~/executable/csv2tsv
Version : v1.1.14
Type    : ELF 64-bit LSB executable, x86-64, version 1 (GNU ...)
Home    : https://github.com/eBay/tsv-utils-dlang (doc)

And here are some alternate items for dealing with CSV / TSV/ DSV files:
Code:
CSV, comma separated value file (comma typical, it can be almost anything, DSV)

        0) cut -d, / awk -F, / perl -a -F,

        1) csvfix
           https://neilb.bitbucket.io/csvfix/ (checked  2017.03)

        2) csvtool
           https://github.com/Chris00/ocaml-csv (checked 2017.03)

        3) Text::CSV -- perl module, DIY

        4) crush, Google collection
           http://crush-tools.googlecode.com/files/crush-tools-2013-04.tar.gz

        5) csvkit, suite of utilities for converting to and working with CSV
           https://github.com/wireservice/csvkit

        6) agate, Python DIY
           https://github.com/wireservice/agate

        7) xlrd, extract data from Microsoft Excel (tm) spreadsheet files
           Includes runxlrd, a small utility to extract data
           Debian repository, and https://github.com/python-excel/xlrd
           Mentioned as one alternative in:
           http://www.python-excel.org/

        8) tsv, CLI tools for large, tabular data files
           filtering, statistics, sampling, joins, etc.
           https://github.com/eBay/tsv-utils-dlang

Here are a few other refrences to look over:

Comma-separated values - Wikipedia
CSV application support - Wikipedia
RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk print - fields separated with comma's need to ignore inbetween double quotes

I am trying to re-format a .csv file using awk. I have 6 fields in the .csv file. Some of the fields are enclosed in double quotes and contain comma's inside the quotes. awk is breaking this into multiple fields. Sample lines from the .csv file: Device Name,Personnel,Date,Solution... (1 Reply)
Discussion started by: jxrst
1 Replies

2. Shell Programming and Scripting

Need Help - comma inside double quote in comma separated csv,

Hello there, I have a comma separated csv , and all the text field is wrapped by double quote. Issue is some text field contain comma as well inside double quote. so it is difficult to process. Input in the csv file is , 1,234,"abc,12,gh","GH234TY",34 I need output like below,... (8 Replies)
Discussion started by: Uttam Maji
8 Replies

3. Shell Programming and Scripting

Perl script to parse output and print it comma separated

I need to arrange output of SQL query into a comma separated format and I'm struggling with processing the output... The output is something like this: <Attribute1 name><x amount of white spaces><Atribute value> <Attribute2 name><x amount of white spaces><Atribute value> <Attribute3... (2 Replies)
Discussion started by: Juha
2 Replies

4. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

5. Shell Programming and Scripting

Help parse comma separated list

I have a list of files with the same name, but they have a different date stamp in the name. I can find the first file, but I need to find the second file. I am using this information to create a variable I use later. Here is a example of how I find the first file. "ls -mr... (11 Replies)
Discussion started by: NoMadBanker
11 Replies

6. Shell Programming and Scripting

How to format file into comma separated field

Guys, Need you help, i have a a file content that look like this. Nokia 3330 <spaces><spaces><more spaces>+76451883874 Nokia 3610 +87467361615 so on and so forth, - there are so many spaces in between. - e.g.... (5 Replies)
Discussion started by: shtobias
5 Replies

7. Shell Programming and Scripting

Its PERL + Comma separated seventh field

Hi Friends, I'm working on a perl script, which seems to be simpler. But I'm very new to PERL scripting. I have a comma separated data file, from which I need to extract only the seventh field data out of available twenty fields to an array using perl. Any help would be much appreciated. ... (17 Replies)
Discussion started by: ganapati
17 Replies

8. Shell Programming and Scripting

Parse apart strings of comma separated data with varying number of fields

I have a situation where I am reading a text file line-by-line. Those lines of data contain comma separated fields of data. However, each line can vary in the number of fields it can contain. What I need to do is parse apart each line and write each field of data found (left to right) into a file.... (7 Replies)
Discussion started by: 2reperry
7 Replies

9. Shell Programming and Scripting

sed removing comma inside double quotes

I have a csv file with lines like the followings 123456,"ABC CO., LTD","XXX" 789012,"DEF LIMITED", "XXX" before I bcp this file to database, the comma in "CO.," need to be removed first. My script is cat <filename> | sed 's/"CO.,"/"CO."/g' but it doesn't work. Can anyone here able to... (2 Replies)
Discussion started by: joanneho
2 Replies

10. Shell Programming and Scripting

Unix shell script to parse the contents of comma-separated file

Dear All, I have a comma-separated file. 1. The first line of the file(header) should have 4 commas(5 fields). 2. The last line of the file should have 1 comma(2 fields). Pls help me in checking this condition in a shell script. And the number of lines between the first line and last... (11 Replies)
Discussion started by: KrishnaSaran
11 Replies
Login or Register to Ask a Question