12-08-2010
CSV: Replacing multiple occurrences inside a pattern
Greatings all,
I am coming to seek your knowledge and some help on an issue I can not currently get over. I have been searching the boards but did not find anything close to this matter I am struggling with.
I am trying to clean a CSV file and make it loadable for my SQL*Loader. My problem currently is being able to replace multiple occurrences of a pattern inside a pattern ?
Imagine a CSV file with:
- Endline: \n
- Field: ;
- Surrounder "
But in which:
- Surrounders are not escaped: any field with ; \n or " in it is simply surrounded by double-quote without escaping any other double-quotes in the data.
- Data can have \r \r\n and \n ; and " characters (enjoy ...)
To make things simple I have:
- Replaced all \n \r\n \n with || (to remove any notion of line while cleaning the file)
- Replaced double quotes by doubled double-quotes " => "" then sed back ""; and ;"" to "; and ;" (ignoring the case of "; ;" in the data i agree but nvm) so that now all data double quotes are escaped.
BUT i cannot find a way to replace the \n \r \r\n that were in the data by \n. To do this I need to replace all occurrences of || (that were initially "\n \r \r\n") inside ;"(.*)"; by \n. Ideally I need to find all occurrences of the ;"(.*)"; inside my file (treated as one whole line since I removed \n and stuff) and within the (.*) replace any matching occurrences of || by \n.
I have tried SED but I fail fo convert only || located between the comma/dblquote pattern. Any idea ?
I hope I have been clear enough, though I think that I may not have. Feel free to ask for some further details if needed.
Regards,
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I am new to using Sed. I have a file containg lines like the following:
INFORM----Test.pc:168:10/11/05 12:34:26 > some text goes here..
TRACE-----Test.pc:197:10/11/05 12:34:26 > some text goes here..
My requirement is to replace 10/11/05 12:34:26 with a string <RUNDATE> (including <... (4 Replies)
Discussion started by: Hema_M
4 Replies
2. Shell Programming and Scripting
Hello,
The following sed command is giving error
sed: -e expression #1, char 13: unknown option to `s'
The sed command is
echo "//-----" | sed "s/\/\/---*/$parChk/g"
where parChk="//---ee-"
How can i print the variable value from sed command ?
And is it possible to replace a... (2 Replies)
Discussion started by: frozensmilz
2 Replies
3. Shell Programming and Scripting
I need to count the number of occurrences of a pattern, say 'key', between each occurrence of a different pattern, say 'lu'.
Here's a portion of the text I'm trying to parse:
lu S1234L_149_m1_vg.6, part-att 1, vdp-att 1 p-reserver IID 0xdb
registrations:
key 4156 4353 0000 0000
... (3 Replies)
Discussion started by: slipstream
3 Replies
4. Shell Programming and Scripting
Hi ,
I want to create a shell script which will find the complexity of a sql script.Suppose a.sql holds the following two query --
select * from table1 a ,table2 b
where a.column1=b.column1;
select * from table1 a ,(select * from table1 c ,table2 d where c.column1=d.column1) e
where... (3 Replies)
Discussion started by: anupamhalder
3 Replies
5. Shell Programming and Scripting
Hi All,
Is it possible to count number of occurrences of a pattern in a single record using awk??
for example:
a line like this:
abrsjdfhafa
I want to count the number of a character occurrences. but still use the default RS, I don't want to set RS to single character. (1 Reply)
Discussion started by: ghoda2_10
1 Replies
6. Shell Programming and Scripting
Hi. I have input like this:
<tr>
<td class="logo1" rowspan="2"><a href="index.html"><img
src="images/logo.png" /></a></td>
<td class="pad1" rowspan="2">__</td>
<td class="userBox"><img src="images/person.png"/> <a href="http://good.mybook.com/login.jsp">Sign In</a></td>
<td... (5 Replies)
Discussion started by: zorrox
5 Replies
7. Shell Programming and Scripting
i have little challenge, help me out.i have a file where i have a value declared and and i have to replace the value when called. for example i have the value for abc and ccc. now i have to substitute the value of
value abc and ccc in the place of them.
Input File:
go to &abc=ddd;
if... (16 Replies)
Discussion started by: saaisiva
16 Replies
8. Shell Programming and Scripting
Hi All,
I want to print all the occurrences for a particular pattern from a file. The catch is that the pattern search is partial and if any word in the file contains the pattern, that complete word has to be printed. If there are multiple words matching the pattern on a specific line, then all... (2 Replies)
Discussion started by: decci_7
2 Replies
9. UNIX for Advanced & Expert Users
I have a line that I need to parse through and extract a pattern that occurs multiple times in it.
Example line:
getInfoCall: info received please proceed, getInfoCall: info received please proceed, getInfoCall: info received please proceed, getInfoCall: info received please proceed,... (4 Replies)
Discussion started by: Vidhyaprakash
4 Replies
10. Shell Programming and Scripting
The lines that I am trying to format look like
Device ID: j01-01, IP address: 10.10.10.36, IP address: 10.10.10.35, IP address: 10.10.102.201, Platform: 8040, Capabilities: Host ,
Interface: GigabitEthernet9/45, Port ID (outgoing port): e0k,Here is what I have so far but it... (4 Replies)
Discussion started by: dis0wned
4 Replies
LEARN ABOUT DEBIAN
wildmat
WILDMAT(3) Library Functions Manual WILDMAT(3)
NAME
wildmat - perform shell-style wildcard matching
SYNOPSIS
int
wildmat(text, pattern)
char *text;
char *pattern;
DESCRIPTION
Wildmat is part of libinn (3). Wildmat compares the text against the pattern and returns non-zero if the pattern matches the text. The
pattern is interpreted according to rules similar to shell filename wildcards, and not as a full regular expression such as those handled
by the grep(1) family of programs or the regex(3) or regexp(3) set of routines.
The pattern is interpreted as follows:
x Turns off the special meaning of x and matches it directly; this is used mostly before a question mark or asterisk, and is not spe-
cial inside square brackets.
? Matches any single character.
* Matches any sequence of zero or more characters.
[x...y]
Matches any single character specified by the set x...y. A minus sign may be used to indicate a range of characters. That is,
[0-5abc] is a shorthand for [012345abc]. More than one range may appear inside a character set; [0-9a-zA-Z._] matches almost all of
the legal characters for a host name. The close bracket, ], may be used if it is the first character in the set. The minus sign,
-, may be used if it is either the first or last character in the set.
[^x...y]
This matches any character not in the set x...y, which is interpreted as described above. For example, [^]-] matches any character
other than a close bracket or minus sign.
HISTORY
Written by Rich $alz <rsalz@uunet.uu.net> in 1986, and posted to Usenet several times since then, most notably in comp.sources.misc in
March, 1991.
Lars Mathiesen <thorinn@diku.dk> enhanced the multi-asterisk failure mode in early 1991.
Rich and Lars increased the efficiency of star patterns and reposted it to comp.sources.misc in April, 1991.
Robert Elz <kre@munnari.oz.au> added minus sign and close bracket handling in June, 1991.
This is revision 1.10, dated 1992/04/03.
SEE ALSO
grep(1), regex(3), regexp(3).
WILDMAT(3)