Processing data that contains space and quote delimiters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Processing data that contains space and quote delimiters
# 1  
Old 06-25-2010
Processing data that contains space and quote delimiters

I need to write a Bash script to process a data file that is in this format:

1 A B C D E
2 F G "H H" I J

As you can see, the data is delimited by a space, but there are also some fields that contain spaces and are surrounded by double-quotes. An example of that is "H H".

I wrote this test script to display the 4th parameter:

Code:
#!/bin/bash
while read line
do
        echo "line=$line"
        param4=$(echo $line | cut -d" " -f4)
        echo "param4=$param4"
done

Here's what it displays:

line=1 A B C D E
param4=C
line=2 F G "H H" I J
param4="H

For the second line of data, I wanted the fourth parameter to be "H H" (without the quotes) instead of one double quote and one H. It is using the other H and the trailing double quote as parameter 5. That is not what I wanted.

How can I process this data?
# 2  
Old 06-26-2010
- Are the quotes always enclosing the same values?
- Could there be more than two quotes in one line?
- Do you want all parameters tested or only some?
# 3  
Old 06-26-2010
Quote:
Are the quotes always enclosing the same values?
No.

Quote:
Could there be more than two quotes in one line?
Yes.

Quote:
Do you want all parameters tested or only some?
All of them.

Sample data might look like this:

Code:
A B C D E F
"A A" B C D E F
A "B B" C D E F
A B "Hi there" D E F
A B C "Lots of words" E F
A B C D "E E E E E E E E" F
A B C D E "F F"

Thanks for your help.

Last edited by Scott; 06-27-2010 at 06:35 AM.. Reason: Added one more code tag
# 4  
Old 06-26-2010
First convert your data into a comma delimited file (csv):
Code:
$ cat sample.dat
A B C D E F
"A A" B C D E F
A "B B" C D E F
A B "Hi there" D E F
A B C "Lots of words" E F
A B C D "E E E E E E E E" F
A B C D E "F F"
$ perl -ne 'my @x=split(/ /); for (0..$#x) {
     if ($x[$_] !~ /"/ && $a != 1) {print $x[$_] . ","; }
  elsif ($x[$_] =~ /^"/) { $x[$_] =~ s/"//g; print $x[$_] . " "; $a=1; }
  elsif ($x[$_] !~ /"$/ && $a == 1) { print $x[$_] . " "; }
   else { $x[$_] =~ s/"//g; print $x[$_] . ","; $a=0 } } ' sample.dat > sample.tmp
$ 
$ sed 's/^,//' sample.tmp | sed '$d' > sample.csv
$ rm sample.tmp
$
$ cat sample.csv
A,B,C,D,E,F
A A,B,C,D,E,F
A,B B,C,D,E,F
A,B,Hi there,D,E,F
A,B,C,Lots of words,E,F
A,B,C,D,E E E E E E E E,F
A,B,C,D,E,F F
$

Now you could use
Code:
param4=$(echo $line | awk -F, '{print $4}')

to assign the value of the fourth column to the param4 variable.
# 5  
Old 06-26-2010
Code:
while read line
do
  eval set -- "$line"
  echo "$4"
done < infile

or:
Code:
print4() 
{ 
  echo "$4"
}

while read line
do
  eval print4 "$line"
done < infile

or (ksh93/bash):
Code:
while read line
do
  eval A=($line)
  echo ${A[3]}
done < infile

output:
Code:
C
H H


Last edited by Scrutinizer; 06-28-2010 at 04:29 PM.. Reason: Optimized ksh93/bash array assignment
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 6  
Old 06-28-2010
Thank you! This code is short and elegant and works nicely:
Code:
while read line
do
  eval set -- "$line"
  echo "$4"
done < infile

How does it work? Are you setting a variable named -- or something?
# 7  
Old 06-28-2010
Hi RickS,

set without options assigns its arguments to the variables $1, $2, etc.. For instance:
Code:
# set a b 
# echo $1
a
# echo $2
b

-- signifies the end of options. Anything that comes after this will not be interpreted as an option to the command "set". This is used to ensure that if a variable has a value the starts with a "-" sign it does not unintentionally set an option.

The eval command plays a crucial role here. It first expandes "$line" so that the command with e.g. the second line of the input file reads:
Code:
set -- 2 F G "H H" I J

So then $1 becomes 2, $2 becomes F, $3 G and $4 is set to "H H" etc..

Last edited by Scrutinizer; 06-28-2010 at 03:20 PM.. Reason: Added "it does not"
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help/Advise please for converting space delimited string variable to comma delimited with quote

Hi, I am wanting to create a script that will construct a SQL statement based on a a space delimited string that it read from a config file. Example of the SQL will be For example, it will read a string like "AAA BBB CCC" and assign to a variable named IN_STRING. I then concatenate... (2 Replies)
Discussion started by: newbie_01
2 Replies

2. Shell Programming and Scripting

Eval command with space, quote and bracket characters

Hi, I am Pradnya Gandhe trying to use in shell script. I want to use a bit complicated command using eval command in a shell script. <path to>\wsadmin.sh -lang jython -conntype NONE -c "AdminApp.install('war file path', '] -MapWebModToVH ] -MapRolesToUsers ] ]')" Works correctly as expected... (2 Replies)
Discussion started by: Pradnya Gandhe
2 Replies

3. Shell Programming and Scripting

Replacing all but the first and last double quote in a line with a single quote with awk

From: 1,2,3,4,5,This is a test 6,7,8,9,0,"This, is a test" 1,9,2,8,3,"This is a ""test""" 4,7,3,1,8,"""" To: 1,2,3,4,5,This is a test 6,7,8,9,0,"This; is a test" 1,9,2,8,3,"This is a ''test''" 4,7,3,1,8,"''"Is there an easy syntax I'm overlooking? There will always be an odd number... (5 Replies)
Discussion started by: Michael Stora
5 Replies

4. Shell Programming and Scripting

Skip the delimiter with in double quotes and count the number of delimiters during data extract

Hi All, I'm stuck-up in finding a way to skip the delimiter which come within double quotes using awk or any other better option. can someone please help me out. Below are the details: Delimited: | Sample data: 742433154|"SYN|THESIS MED CHEM PTY.... (2 Replies)
Discussion started by: BrahmaNaiduA
2 Replies

5. UNIX for Dummies Questions & Answers

Replace Delimiters with Space

Hi All, How to Replace the delimiter for a particular field. I have used awk to replace the field values based on the position, but I tried to remove/replace delimiters with space on particular positions. I tried tr command with combination of awk not sure if this is the correct way, but I am... (3 Replies)
Discussion started by: mora
3 Replies

6. Shell Programming and Scripting

Replacing trailing space with single quote

Platform : RHEL 5.8 I want to end each line of this file with a single quote. $ cat hello.txt blueskies minnie mickey gravity snoopyAt VI editor's command mode, I have used the following command to replace the last character with a single quote. ~ ~ ~ :%s/$/'/gNow, the lines in the... (10 Replies)
Discussion started by: John K
10 Replies

7. Shell Programming and Scripting

Insert data between comma delimiters-large file

Having a huge file in the following format. 2,3,1,,,4 1,2,3,,,,,5, 8,7,3,4,,,, Output needed is: 2,3,1,0.0,0.0,4 1,2,3,0.0,0.0,0.0,0.0,5, 8,7,3,4,0.0,0.0,0.0, I have tried reading the file each line, using AWK to parse to find out ",," and then insert 0.0 . It works but very slow. Need... (8 Replies)
Discussion started by: wincrazy
8 Replies

8. UNIX for Dummies Questions & Answers

Remove two delimiters, space and double quotes

I would like to know how to replace a space delimiter with a ^_ (\037) delimiter and a double quote delimiter while maintaining the spaces inside the double quotes. The double quote delimiter is only used on text fields. I'd prefer a one-liner, but could handle a function or script that accepts... (4 Replies)
Discussion started by: SteveDWin
4 Replies

9. Shell Programming and Scripting

Regex in grep to match all lines ending with a double quote (") OR a single quote (')

Hi, I've been trying to write a regex to use in egrep (in a shell script) that'll fetch the names of all the files that match a particular pattern. I expect to match the following line in a file: Name = "abc" The regex I'm using to match the same is: egrep -l '(^) *= *" ** *"$' /PATH_TO_SEARCH... (6 Replies)
Discussion started by: NanJ
6 Replies

10. Shell Programming and Scripting

Capturing Data between first quote and next quote

I have input file like RDBMS FALIURE UTY8703 'USER_WORK.TEST' .HIghest return code '12' I want to parse data which comed between first quote till next quote USER_WORK.TEST can you please suggest how to do that (4 Replies)
Discussion started by: scorp_rahul23
4 Replies
Login or Register to Ask a Question