Simple sed one-liner for fixing unencoded ampersands


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Simple sed one-liner for fixing unencoded ampersands
# 1  
Old 07-22-2008
Simple sed one-liner for fixing unencoded ampersands

Hi,
I recieve some XML-files that constantly has bad encoded content. There are Ampersands that are not encoded correctly causing my XML-parser to halt.
I wrote a sed one-liner to fix any stand alone "&":

sed -e 's/&[^amp;|^apos;|^quot;|^lt;|^gt;]/&/gi' input.xml

testfile for input:
<xml>
<source> &quot; One &quot; </source>
<name>test &amp; test</name>
<last>test2&amp;test2</last>
<address>test3 &apos; test3</address>
<area> test5 &lt; test5</area>
<post> test6 &gt; </post>
<test> test7 &</test>
</xml>

My problem is that the caracter after the "&" is removed as well, destroying the XML-tag
Output:

<xml>
<source> &quot; One &quot; </source>
<name>test &amp; test</name>
<last>test2&amp;test2</last>
<address>test3 &apos; test3</address>
<area> test5 &lt; test5</area>
<post> test6 &gt; </post>
<test> test7 &amp;/test>
</xml>

I tried the script on both Unix and in Windows 2000 (with unixutil)
Any Ideas?

/Tobbe
# 2  
Old 07-22-2008
Code:
sed -e 's/&[^amp;|^apos;|^quot;|^lt;|^gt;]/\&amp;/gi' input.xml

Try to escape the "&"
# 3  
Old 07-22-2008
Data same result with escaped "&"

sed -e 's/\&[^amp;|^apos;|^quot;|^lt;|^gt;]/\&amp;/gi' input.xml

<xml>
<source> &quot; One &quot; </source>
<name>test &amp; test</name>
<last>test2&amp;test2</last>
<address>test3 &apos; test3</address>
<area> test5 &lt; test5</area>
<post> test6 &gt; </post>
<test> test7&amp;/test>
</xml>
# 4  
Old 07-22-2008
Quote:
Originally Posted by tobbe
sed -e 's/\&[^amp;|^apos;|^quot;|^lt;|^gt;]/\&amp;/gi' input.xml

<xml>
<source> &quot; One &quot; </source>
<name>test &amp; test</name>
<last>test2&amp;test2</last>
<address>test3 &apos; test3</address>
<area> test5 &lt; test5</area>
<post> test6 &gt; </post>
<test> test7&amp;/test>
</xml>
Your match catches the part after the "&" (after all, that's what all that "not" business is!). Wrap the expression in parentheses (remember to escape them!), and then include a backreference in the substitution.
# 5  
Old 07-22-2008
OK thanks for the advice.
In perl the content of the parenthesis are: $1, $2 etc.
What is the syntax like in sed?

/T
# 6  
Old 07-22-2008
Quote:
Originally Posted by tobbe
OK thanks for the advice.
In perl the content of the parenthesis are: $1, $2 etc.
What is the syntax like in sed?

/T
Under Bash's backslash-escaping rules:
Code:
echo 'Hi mom!' | sed 's/ mom\(.\)/\1  How are you?/'

produces:
Code:
Hi!  How are you?

# 7  
Old 07-23-2008
Thanks!
That works OK:

echo "AB &CD&amp;EF" | sed -e 's/\&\([^\amp;]\)/\&amp;\1/'
AB &amp;CD&amp;EF
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed one Liner option -e

Hi, I have the following command.(Delete all trailing blank lines at the end of a file.) sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' I don't understand the logic of this command and also I don't understand why -e is used. Can you please let me know the logic of this command and why three -e... (5 Replies)
Discussion started by: TomG
5 Replies

2. UNIX for Dummies Questions & Answers

awk or sed one liner

I have a data base of part numbers: AAA Thing1 BBB Thing2 CCC Thing3 File one is a list of part numbers: XXXX AAA234 XXXX BBB678 XXXX CCC2345 Is there a sed one-line that would compare a data base with and replace the part numbers so that the output looks like this? XXXX AAA234... (7 Replies)
Discussion started by: jimmyf
7 Replies

3. UNIX for Dummies Questions & Answers

sed one-liner

I have a data base of part numbers: AAA Thing1 BBB Thing2 CCC Thing3 File one is a list of part numbers: AAA234 BBB678 CCC2345 Is there a sed one-line that would compare a data base with and replace the part numbers so that the output looks like this? AAA234 Thing1 BBB678 Thing2... (5 Replies)
Discussion started by: jimmyf
5 Replies

4. UNIX for Advanced & Expert Users

sed one liner simialr to tail command

Can anyone explain the below sed oneliner? sed -e ':a' -e '$q;N;11,$D;ba' It works same as tail command. I just want to know how it works. Thanks (1 Reply)
Discussion started by: pandeesh
1 Replies

5. UNIX for Advanced & Expert Users

Please explain this sed one liner

Can anyone explain the below sed oneliner? sed -e ':a' -e '$q;N;11,$D;ba' It works same as tail command. I just want to know how it works. Thanks ---------- Post updated at 11:42 PM ---------- Previous update was at 11:37 PM ---------- Moderators, Can you please delete this thread?... (0 Replies)
Discussion started by: pandeesh
0 Replies

6. Shell Programming and Scripting

help with sed one liner

hey everyone, I want to remove some characters from a string that i have with sed. For example if my string is: a0=bus a1=car a2=truck I want my output to look like this: bus car truck So i want to delete the two characters before the = and including the =. This is what i came up with... (3 Replies)
Discussion started by: GmGeubt
3 Replies

7. Shell Programming and Scripting

Simple awk conditional one-liner

Hello, I'm looking for an awk one-liner that prints the first two data fields, then contains a conditional where if $3>$4, it prints $3-$4. Otherwise, it prints $3. Example: Data file: 123,456,999,888 333,222,444,555 654,543,345,888 444,777,333,111 Output: 123,456,111 333,222,444... (2 Replies)
Discussion started by: palex
2 Replies

8. Shell Programming and Scripting

Clarification needed for a SED one liner

I want to use SED to replace all new line characters of a file, I googled and found this one liner sed '{:q;N;s/\n//g;t q}' infile what do :q;N; and t q mean in this script? (6 Replies)
Discussion started by: kevintse
6 Replies

9. Shell Programming and Scripting

Issue with a sed one liner variant - sed 's/ ; /|/g' $TMP1 > $TMP

Execution of the following segment is giving the error - Script extract:- OUT=$DATADIR/sol_rsult_orphn.bcp TMP1=${OUT}_tmp1 TMP=${OUT}_tmp ( isql -w 400 $dbConnect_OPR <<EOF select convert(char(10), s.lead_id) +'|' + s.pho_loc_type, ";", s.sol_rsult_cmnt, ";", +'|'+ s.del_ind... (3 Replies)
Discussion started by: kzmatam
3 Replies

10. Linux

fixing with sed

I am trying to replace the value of $f3 but its not working . I don't know what I am missing here . cat dim_copy.20080516.sql | grep -i "create view" | grep -v OPSDM002 | while read f1 f2 f3 f4 f5 f6 f7 f8 f9 do echo " $f3 " sed -e... (13 Replies)
Discussion started by: capri_drm
13 Replies
Login or Register to Ask a Question