We are basically receiving a csv file from the vendor and any field they deemed as text, they will enclose with a double quote.
The problem arises when they also have/use a double quote as part of the data.
OK. The following may not be a perfect solution (coming from a less-than-rock-solid definition), but check how far you get with it.
Let us say that quotes you want to preserve are the ones immediately preceeding or following commas (which seem to be the field separators here). In addition there is a single double-quote at the beginning of the line and one at the end of the line. All the other double quotes should become single quotes.
This would work for your example, but there are cases conceivable where this ruleset could be tricked. This is why i suggest you doubly check if it works on your data or if we need to make the ruleset more solid.
Solution: first, all the sequences of "," are replaced by a placeholder (i use "@@", change it to something else if this is used in your data). Also the double-quotes at BOL and EOL are replaced. Then i change the remaining double-quotes to single-quotes and finally transfer the placeholders back.
This sounds complicated, but it makes the regexps necessary a lot easier to handle (and to understand).
Code:
sed "s/^\"/@@/;s/\"$/@@/;s/\",\"/@@/g;s/\",/@@@/g;s/,\"/@@@@/g
s/\"/\'/g
s/^@@/\"/;s/@@$/\"/;s/@@@@/,\"/g;s/@@@/\",/g;s/@@/\",\"/g" /path/to/input
i m trying the following command but its not working:
sed 's/find/\'replace\'/g' myFile
but the sed enters into new line
# sed 's/find/re\'place/g' myFile
>
I havn't any idea how to put single quote in my replace string. Your early help woud be appreciated. Thanx (2 Replies)
I'm not very familiar with the ssh command. When I tried to set a variable and then echo its value on a remote machine via ssh, I found a problem. For example,
$ ITSME=itsme
$ ssh xxx.xxxx.xxx.xxx "ITSME=itsyou; echo $ITSME"
itsme
$ ssh xxx.xxxx.xxx.xxx 'ITSME=itsyou; echo $ITSME'
itsyou
$... (3 Replies)
Hi,
I've been trying to write a regex to use in egrep (in a shell script) that'll fetch the names of all the files that match a particular pattern. I expect to match the following line in a file:
Name = "abc"
The regex I'm using to match the same is:
egrep -l '(^) *= *" ** *"$' /PATH_TO_SEARCH... (6 Replies)
Hi I want to replace single quote with two single quotes in a perl string.
If the string is <It's Simpson's book> It should become <It''s Simpson''s book> (3 Replies)
Hi,
I have data as
"01/22/97-"aaaaaaaaaaaaaaaaa""aaa""aabbbbbbbbcccccc""zbcd""dddddddddeeeeeeeeefffffff"
I want to remove only the Consequitive double quotes and not the one which occurs single.
My O/P must be ... (2 Replies)
Hi,
Trying to change the prompt. I have the following code.
export PS1='
<${USER}@`hostname -s`>$ '
The hostname is not displayed
<abc@`hostname -s`>$ uname -a
AIX xyz 1 6 00F736154C00
<adcwl4h@`hostname -s`>$
If I use double quotes, then the hostname is printed properly but... (3 Replies)
Hi All,
I have been trying to replace a string using the sed command
string value contain blackslash and double quotes. I am not a expert writer of unix script but do try not to ask question. I have almost given up. Hope you all can give me some suggestion
I want to replace a place string... (6 Replies)
Hi All,
I'm unable to load the data using sql loader where there are double quotes within the double quotes As these are optionally enclosed by double quotes.
Sample Data :
"221100",138.00,"D","0019/1477","44012075","49938","49938/15043000","Television - 22" Refurbished - Airwave","Supply... (6 Replies)
From:
1,2,3,4,5,This is a test
6,7,8,9,0,"This, is a test"
1,9,2,8,3,"This is a ""test"""
4,7,3,1,8,""""
To:
1,2,3,4,5,This is a test
6,7,8,9,0,"This; is a test"
1,9,2,8,3,"This is a ''test''"
4,7,3,1,8,"''"Is there an easy syntax I'm overlooking? There will always be an odd number... (5 Replies)
Please use code tags
Hi,
I have input data is below format and n of column in the multiple flat files. the string data has any double quotes(") values replaced to double double quotes for all the columns{"").
Also, my input flat file each column string data has carriage of new line too.... (14 Replies)
Discussion started by: SSrini
14 Replies
LEARN ABOUT REDHAT
text::parsewords
Text::ParseWords(3pm) Perl Programmers Reference Guide Text::ParseWords(3pm)NAME
Text::ParseWords - parse text into an array of tokens or array of arrays
SYNOPSIS
use Text::ParseWords;
@lists = &nested_quotewords($delim, $keep, @lines);
@words = "ewords($delim, $keep, @lines);
@words = &shellwords(@lines);
@words = &parse_line($delim, $keep, $line);
@words = &old_shellwords(@lines); # DEPRECATED!
DESCRIPTION
The &nested_quotewords() and "ewords() functions accept a delimiter (which can be a regular expression) and a list of lines and then
breaks those lines up into a list of words ignoring delimiters that appear inside quotes. "ewords() returns all of the tokens in a
single long list, while &nested_quotewords() returns a list of token lists corresponding to the elements of @lines. &parse_line() does
tokenizing on a single string. The &*quotewords() functions simply call &parse_lines(), so if you're only splitting one line you can call
&parse_lines() directly and save a function call.
The $keep argument is a boolean flag. If true, then the tokens are split on the specified delimiter, but all other characters (quotes,
backslashes, etc.) are kept in the tokens. If $keep is false then the &*quotewords() functions remove all quotes and backslashes that are
not themselves backslash-escaped or inside of single quotes (i.e., "ewords() tries to interpret these characters just like the Bourne
shell). NB: these semantics are significantly different from the original version of this module shipped with Perl 5.000 through 5.004.
As an additional feature, $keep may be the keyword "delimiters" which causes the functions to preserve the delimiters in each string as
tokens in the token lists, in addition to preserving quote and backslash characters.
&shellwords() is written as a special case of "ewords(), and it does token parsing with whitespace as a delimiter-- similar to most
Unix shells.
EXAMPLES
The sample program:
use Text::ParseWords;
@words = "ewords('s+', 0, q{this is "a test" of quotewords "for you});
$i = 0;
foreach (@words) {
print "$i: <$_>
";
$i++;
}
produces:
0: <this>
1: <is>
2: <a test>
3: <of quotewords>
4: <"for>
5: <you>
demonstrating:
0 a simple word
1 multiple spaces are skipped because of our $delim
2 use of quotes to include a space in a word
3 use of a backslash to include a space in a word
4 use of a backslash to remove the special meaning of a double-quote
5 another simple word (note the lack of effect of the backslashed double-quote)
Replacing ""ewords('s+', 0, q{this is...})" with "&shellwords(q{this is...})" is a simpler way to accomplish the same thing.
AUTHORS
Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original author unknown). Much of the code for &parse_line() (including the
primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
Examples section another documentation provided by John Heidemann <johnh@ISI.EDU>
Bug reports, patches, and nagging provided by lots of folks-- thanks everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
for assuring me that a &nested_quotewords() would be useful, and to Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
error-checking (sort of-- you had to be there).
perl v5.8.0 2002-06-01 Text::ParseWords(3pm)