02-10-2012
Remove Unicode/special chars from XML
Hi,
We are receiving an XML file in Unix which has some special characters between tags like '^' etc
<Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag>
We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e. in tag text.
The upstream system is sending some unicode characters which are getting convrted to carot symbols in Unix (apart from & and > and <). This is causing my XML parser to abort or drop rows which have such data.
Please provide a perl command to remove them. (we need to remove '&' and '<' and '>' which are present in tag 'text')
Thanks
DSR
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hi,
One of our application is producing log files. But if we open the log file in vi or less or view mode, it shows all the special characters in it. The 'cat' shows correctly but it shows only last page. If I do 'cat' <file_name> | more, then again it shows special characters.
... (1 Reply)
Discussion started by: divakarp
1 Replies
2. Shell Programming and Scripting
Hi,
I need some advise on treating non printable chars over ascii value 126
Case 1 :
On some fields in the text , I need to retiain then 'as-is' and load to a database.I understand it also depends on database codepage.
but i just wanna know how do i ensure it do not change while loading... (1 Reply)
Discussion started by: braindrain
1 Replies
3. Shell Programming and Scripting
here is my simple script to show process and owners except me:
ps `-ef |grep xterm |grep -v aucar` | while read a1 a2 a3 a4 a5 a6 a7 a8
do
echo KILL..\($a1\).. $a2 |more
done
how can I pass values from command "ps -ef |grep xterm|grep -v aucar" to ?
because above command... (2 Replies)
Discussion started by: xramm
2 Replies
4. UNIX for Dummies Questions & Answers
Hi,
How do I remove the lines where special characters or Unicode characters appear?
The following query does work but I wonder if there is a better way.
cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.'
The following lines show that my query is incomplete.
Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies
5. Shell Programming and Scripting
I'm trying to check-in a repository to svn -- but the import is failing because some files waaaay down deep in some graphics-library folder are using unicode characters in the file name - which are masked using the ls command but picked up when piping output to more:
# ls -l 1914*
-rwxrwxr-x 1... (2 Replies)
Discussion started by: mshallop
2 Replies
6. Shell Programming and Scripting
Hi,
I have a Master file (file.txt) with good and bad records( records with unicode characters). I ahve a file with only bad records (bad.txt)
I want the records in file.txt which are not present in bad.txt ie only the good records.
I tried comm -23 file.txt bad.txt
It is giving... (14 Replies)
Discussion started by: ashwin3086
14 Replies
7. Shell Programming and Scripting
Hi, I'm having trouble with awk print all characters between 2 patterns. I tried more then one solution found on this forum but with no success.
Probably my mistakes are due to the special characters "" and "]"in the search patterns.
Well, have a log file like this:
logfile.txt
... (3 Replies)
Discussion started by: ginolatino
3 Replies
8. Shell Programming and Scripting
I have a file with multiple lines. From each line I want to get all strings that starts with '+' and ends with '/'. Then I want the strings to be separated by ' + '
Example input:
+$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_ACC
Sample output:
$A$ + At + K (20 Replies)
Discussion started by: Viernes
20 Replies
9. Shell Programming and Scripting
Hey Guys,
I'm swamped writing code for the forums:
Could someone write a script or command line to safely delete files with special chars in filenames from a directory:
Example:
-rw-r--r-- 1 root root 148 Apr 30 23:00 ?xA??
-rw-r--r-- 1 root root 148... (8 Replies)
Discussion started by: Neo
8 Replies
10. UNIX for Beginners Questions & Answers
Hi Team,
I have a file a1.txt with data as follows.
dfjakjf...asdfkasj</EnableQuotedIDs><SQL><SelectStatement modified='1' type='string'><!
The delimiter string: <SelectStatement modified='1' type='string'><!
dlm="<SelectStatement modified='1' type='string'><!
The above command is... (7 Replies)
Discussion started by: kmanivan82
7 Replies
LEARN ABOUT SUNOS
xml::parser::style::stream
Parser::Style::Stream(3) User Contributed Perl Documentation Parser::Style::Stream(3)
NAME
XML::Parser::Style::Stream - Stream style for XML::Parser
SYNOPSIS
use XML::Parser;
my $p = XML::Parser->new(Style => 'Stream', Pkg => 'MySubs');
$p->parsefile('foo.xml');
{
package MySubs;
sub StartTag {
my ($e, $name) = @_;
# do something with start tags
}
sub EndTag {
my ($e, $name) = @_;
# do something with end tags
}
sub Characters {
my ($e, $data) = @_;
# do something with text nodes
}
}
DESCRIPTION
This style uses the Pkg option to find subs in a given package to call for each event. If none of the subs that this style looks for is
there, then the effect of parsing with this style is to print a canonical copy of the document without comments or declarations. All the
subs receive as their 1st parameter the Expat instance for the document they're parsing.
It looks for the following routines:
* StartDocument
Called at the start of the parse .
* StartTag
Called for every start tag with a second parameter of the element type. The $_ variable will contain a copy of the tag and the %_ vari-
able will contain attribute values supplied for that element.
* EndTag
Called for every end tag with a second parameter of the element type. The $_ variable will contain a copy of the end tag.
* Text
Called just before start or end tags with accumulated non-markup text in the $_ variable.
* PI
Called for processing instructions. The $_ variable will contain a copy of the PI and the target and data are sent as 2nd and 3rd
parameters respectively.
* EndDocument
Called at conclusion of the parse.
perl v5.8.4 2003-08-18 Parser::Style::Stream(3)