How to extract fields from a CSV i.e comma separated where some of the fields having comma as value?
can anyone help me!!!! How to I parse the CSV file
file name : abc.csv (csv file) The above file containing data like
I have a requirement like i have to extract different field and assign them into different variables.
My Code:
outputit gave the output as single quote)
' ' ' ' ' ' Required Output Should be likemy 5th field having value "comma" )
, jkk , rgr , rgr rgr
Moderator's Comments:
Please use CODE tags as required by forum rules!
Last edited by RudiC; 02-19-2019 at 08:54 AM..
Reason: Added CODE tags.
You're problem is quite ubiquitous and has been solved / commented on in these fora more than several times. Try the search function, or one of the links given at the lower left of this page under "More UNIX and Linux Forum Topics You Might Find Helpful".
The code with which you addressed your problem is not the most efficient one. Did you consider just reading the variables? Like
can anyone help me!!!! How to I parse the CSV file
file name : abc.csv (csv file) The above file containing data like
I have a requirement like i have to extract different field and assign them into different variables.
My Code:
output: (it gave the output as single quote)
' ' ' ' ' ' Required Output Should be like: (my 5th field having value "comma" )
, jkk , rgr , rgr rgr
First: even if your script line is working, it will be slow as a slug. The reason is that you call external (that is: external to the shell) programs (here: awk) for every line over and over again. Calling a program costs a considerable amount of time. You will not notice that if you call it once but if your file has, say, 1000 lines and each lines has 10 fields, you will call awk 10.000 times. That takes a considerable amount of time.
What you want to do is called "parsing" and if you want to assign the parsed values to (shell) variables in the end you can as well do the parsing itself in the shell too. Let us start with something simple: a loop that reads a string one character at a time. I suppose your input file does not contain anything fancy (like escaped characters, unprintable sequences, etc.) but only printable characters. In this case it can be done in shell alone quite simply:
Now let this run against any (short!) file with a few lines and see how it works. It reads one line at a time, prints it (this is just to control its workings) completely, then starts to remove one character at a time until the end of the line. The removed character is also printed to show the mechanism.
Now we need to determine if we are inside a quotation or outside. This can be done simply by introducing a logical flag which we switch every time we find a quote. We start with the flag in "off", switch it to "on" when we find a quote, switch it to "off" again when we find another quote, etc.. Here we go:
Now we are prepared to deal with the fields themselves: we read a character a time and add that to a growing string which represents our next field. When we encounter a comma we react in one of two ways: when the quotation-flag is switched off, we found a legal separator and we output the field and start over with the next one. If the quotation-field is switched on we just the comma as a normal character to the fields value and carry on.
Here it is:
Try this with a small sample file and see if it works for you. Try to understand its inner workings.
You can use this as a basic structure and refine it to your needs. Where now the echo-statements just display the field values you can insert code to actually deal with the values instead. Things i have left out for clarity are: there are two ways to quote, single and double quotes. How should the script deal with that, i.e.:
My script only takes single-quotes into account but if you allow single- and double-quotes it gets a bit more complicated when one quote is inside a string quoted with the other type. Usually the rule is that quote characters lose their special meanings inside a quoted string. The example above would be two quoted strings, one with a single quote character in it, one with a double-quote character in it.
Also i left out some context checking: suppose this input line:
The last opened quotation is not closed again. Should this lead to an error? My script ignores it simply, but you might want to raise some error because of this. How you react to such a line is your decision and you need to decide based on your purpose.
Another thing is if you have escaped quote characters like this:
It is possible to parse these (hint: introduce an "escape-flag", which you set whenever you encounter a "\") but again it makes the parser more complicated.
I hope this helps.
bakunin
Last edited by bakunin; 02-19-2019 at 06:38 PM..
Reason: added forgotten quotes
Hi All,
Hope you are doing Great!!!.
Today i have came up with a problem to say exactly it was for performance improvement.
I have written code in perl as a solution for this to cut in specific range, but it is taking time to run for files thousands of lines so i am expecting
a sed... (9 Replies)
Hi Experts,
Please support
I have below data in file in comma seperated, but 4th column is containing comma in between numbers, bcz of which when i tried to parse the file the column 6th value(5049641141) is being removed from the file and value(222.82) in column 5 becoming value of column6.
... (3 Replies)
Hi All,
Seeking for your assistance on how to combine two fields if there's a comma between them.
What i did was, extract the records from database.
file1.csv(extracted from db)
82|Supplies Station, Inc.|112012|KARISSA APPAREL, INC. - 112012|NON TRADE F/A AND... (2 Replies)
I am trying to re-format a .csv file using awk. I have 6 fields in the .csv file. Some of the fields are enclosed in double quotes and contain comma's inside the quotes. awk is breaking this into multiple fields.
Sample lines from the .csv file:
Device Name,Personnel,Date,Solution... (1 Reply)
Hello there,
I have a comma separated csv , and all the text field is wrapped by double quote. Issue is some text field contain comma as well inside double quote. so it is difficult to process.
Input in the csv file is ,
1,234,"abc,12,gh","GH234TY",34
I need output like below,... (8 Replies)
Hi,
I am newbie in unix.
Could someone tell me how do I redirect my grep output to a csv/excel ?
I have used below command but the outputs are appearing in one column Not in different column.
grep -e cmd -e cmd1 test.txt | cut -f 5 | sort | uniq -c> op.csv
I do not understand how do I... (14 Replies)
Hello
I have a csv file which I need to insert addtional commas into. The csv is of the format
field1,field2,field3,field4,...etc...,field13,field14
I need to add extra commas in each record so that the final output looks like
... (1 Reply)
Hello All
Can anyone please guide me how to solve the issue
In the below code I am getting concat of two value in result variable with comma separated
result=`sqlplus -s / <<EOF
set pages 0 feed off;
set feedback off;
spool abc.txt
select... (4 Replies)
I have a situation where I am reading a text file line-by-line. Those lines of data contain comma separated fields of data. However, each line can vary in the number of fields it can contain. What I need to do is parse apart each line and write each field of data found (left to right) into a file.... (7 Replies)
Hello all,
I am a unix dummy. I am trying to remove spaces between fields. I have the file in the following format
12332432, 2345 , asdfsdf ,100216 , 9999999
12332431, 2341 , asdfsd2 ,100213 , 9999999
&... (2 Replies)