You're problem is quite ubiquitous and has been solved / commented on in these fora more than several times. Try the search function, or one of the links given at the lower left of this page under "More UNIX and Linux Forum Topics You Might Find Helpful".
The code with which you addressed your problem is not the most efficient one. Did you consider just reading the variables? Like
Code:
{ read dummy
while IFS=, read a b c d e f dummy
do echo $e
echo $f
.
.
.
done
} < $file
output: (it gave the output as single quote)
' ' ' ' ' ' Required Output Should be like: (my 5th field having value "comma" )
, jkk , rgr , rgr rgr
First: even if your script line is working, it will be slow as a slug. The reason is that you call external (that is: external to the shell) programs (here: awk) for every line over and over again. Calling a program costs a considerable amount of time. You will not notice that if you call it once but if your file has, say, 1000 lines and each lines has 10 fields, you will call awk 10.000 times. That takes a considerable amount of time.
What you want to do is called "parsing" and if you want to assign the parsed values to (shell) variables in the end you can as well do the parsing itself in the shell too. Let us start with something simple: a loop that reads a string one character at a time. I suppose your input file does not contain anything fancy (like escaped characters, unprintable sequences, etc.) but only printable characters. In this case it can be done in shell alone quite simply:
Code:
#! /bin/ksh
chLine=""
chChar=""
while read chLine ; do # go through the whole file
echo "Whole line: $chLine" # test output
while [ -n "$chLine" ] ; do # go through the line
chChar="${chLine%${chLine#?}}" # chop off the first character
chLine="${chLine#?}" # and remove that from the line
echo "$chChar" # test output
done
done
Now let this run against any (short!) file with a few lines and see how it works. It reads one line at a time, prints it (this is just to control its workings) completely, then starts to remove one character at a time until the end of the line. The removed character is also printed to show the mechanism.
Now we need to determine if we are inside a quotation or outside. This can be done simply by introducing a logical flag which we switch every time we find a quote. We start with the flag in "off", switch it to "on" when we find a quote, switch it to "off" again when we find another quote, etc.. Here we go:
Code:
#! /bin/ksh
chLine=""
chChar=""
lInsideQuote=0
while read chLine ; do
echo "Whole line: $chLine"
lInsideQuote=0 # switch off for new line
while [ -n "$chLine" ] ; do
chChar="${chLine%${chLine#?}}"
chLine="${chLine#?}"
case "$chChar" in
\') # if we found a single quote
if (( lInsideQuote )) ; then # flip the status of the flag
lInsideQuote=0
else
lInsideQuote=1
fi
;;
*) # ignore all other characters
;;
esac
if (( lInsideQuote )) ; then # test output shows quote-status
echo "$chChar" (inside quote)"
else
echo "$chChar" (outside quote)"
fi
done
done
Now we are prepared to deal with the fields themselves: we read a character a time and add that to a growing string which represents our next field. When we encounter a comma we react in one of two ways: when the quotation-flag is switched off, we found a legal separator and we output the field and start over with the next one. If the quotation-field is switched on we just the comma as a normal character to the fields value and carry on.
Here it is:
Code:
#! /bin/ksh
chLine=""
chChar=""
chField=""
lInsideQuote=0
while read chLine ; do
echo "Whole line: $chLine"
lInsideQuote=0
chField="" # reset field buffer for next line
while [ -n "$chLine" ] ; do
chChar="${chLine%${chLine#?}}"
chLine="${chLine#?}"
case "$chChar" in
"'")
if (( lInsideQuote )) ; then
lInsideQuote=0
else
lInsideQuote=1
fi
;;
",") # if we found a comma
if (( lInsideQuote )) ; then # inside a quoted string
chField="${chField}," # we just add the comma
else
# outside a quoted string
echo "Field: $chField" # field is finished, display it
chField="" # empty buffer for next field
fi
;;
*)
chField="${chField}${chChar}" # all other chars are just added
;;
esac
done
echo "last Field: $chField" # display the last field in the line
done
Try this with a small sample file and see if it works for you. Try to understand its inner workings.
You can use this as a basic structure and refine it to your needs. Where now the echo-statements just display the field values you can insert code to actually deal with the values instead. Things i have left out for clarity are: there are two ways to quote, single and double quotes. How should the script deal with that, i.e.:
Code:
"double ' quoted", 'single " quoted'
My script only takes single-quotes into account but if you allow single- and double-quotes it gets a bit more complicated when one quote is inside a string quoted with the other type. Usually the rule is that quote characters lose their special meanings inside a quoted string. The example above would be two quoted strings, one with a single quote character in it, one with a double-quote character in it.
Also i left out some context checking: suppose this input line:
Code:
'field1','field2
The last opened quotation is not closed again. Should this lead to an error? My script ignores it simply, but you might want to raise some error because of this. How you react to such a line is your decision and you need to decide based on your purpose.
Another thing is if you have escaped quote characters like this:
Code:
"a double quote character \" inside a double-quoted string"
It is possible to parse these (hint: introduce an "escape-flag", which you set whenever you encounter a "\") but again it makes the parser more complicated.
I hope this helps.
bakunin
Last edited by bakunin; 02-19-2019 at 06:38 PM..
Reason: added forgotten quotes
In a recent shell, e.g. bash, all can be done with a "here string" and "parameter expansion: Pattern substitution", like
Code:
$ T=$'\001' # set a token improbable to appear in file
$ while read LINE
do IFS=, read a b c d e f REST <<< ${LINE//\',\'/$T} # read from $LINE but with ',' replaced by token
echo ${a//$T/,}:${b//$T/,}:${c//$T/,}:${d//$T/,}:${e//$T/,}:${f//$T/,} # print variables with token replaced by comma
done < file
abv:sfs::hju:,:jkk wff
Hi All,
Hope you are doing Great!!!.
Today i have came up with a problem to say exactly it was for performance improvement.
I have written code in perl as a solution for this to cut in specific range, but it is taking time to run for files thousands of lines so i am expecting
a sed... (9 Replies)
Hi Experts,
Please support
I have below data in file in comma seperated, but 4th column is containing comma in between numbers, bcz of which when i tried to parse the file the column 6th value(5049641141) is being removed from the file and value(222.82) in column 5 becoming value of column6.
... (3 Replies)
Hi All,
Seeking for your assistance on how to combine two fields if there's a comma between them.
What i did was, extract the records from database.
file1.csv(extracted from db)
82|Supplies Station, Inc.|112012|KARISSA APPAREL, INC. - 112012|NON TRADE F/A AND... (2 Replies)
I am trying to re-format a .csv file using awk. I have 6 fields in the .csv file. Some of the fields are enclosed in double quotes and contain comma's inside the quotes. awk is breaking this into multiple fields.
Sample lines from the .csv file:
Device Name,Personnel,Date,Solution... (1 Reply)
Hello there,
I have a comma separated csv , and all the text field is wrapped by double quote. Issue is some text field contain comma as well inside double quote. so it is difficult to process.
Input in the csv file is ,
1,234,"abc,12,gh","GH234TY",34
I need output like below,... (8 Replies)
Hi,
I am newbie in unix.
Could someone tell me how do I redirect my grep output to a csv/excel ?
I have used below command but the outputs are appearing in one column Not in different column.
grep -e cmd -e cmd1 test.txt | cut -f 5 | sort | uniq -c> op.csv
I do not understand how do I... (14 Replies)
Hello
I have a csv file which I need to insert addtional commas into. The csv is of the format
field1,field2,field3,field4,...etc...,field13,field14
I need to add extra commas in each record so that the final output looks like
... (1 Reply)
Hello All
Can anyone please guide me how to solve the issue
In the below code I am getting concat of two value in result variable with comma separated
result=`sqlplus -s / <<EOF
set pages 0 feed off;
set feedback off;
spool abc.txt
select... (4 Replies)
I have a situation where I am reading a text file line-by-line. Those lines of data contain comma separated fields of data. However, each line can vary in the number of fields it can contain. What I need to do is parse apart each line and write each field of data found (left to right) into a file.... (7 Replies)
Hello all,
I am a unix dummy. I am trying to remove spaces between fields. I have the file in the following format
12332432, 2345 , asdfsdf ,100216 , 9999999
12332431, 2341 , asdfsd2 ,100213 , 9999999
&... (2 Replies)