01-22-2008
Parse XML file encoded in ISO-8859-1
Dear Friends,
I have an XML file that's encoded in ISO-8859-1. I have some European characters coming in from 2 fields (Name, Comments) in the XML file. Can anyone suggest if there are any functions in Unix to read those characters? Using shell programming, can I parse this xml file?
Please suggest. I have not done this before.
Regards,
Madhavi.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies
2. Shell Programming and Scripting
Hi,
I need to parse the following XML data enclosed in <a> </a> XML tag using shell script.
<X>
.....
</X>
<a>
<b>
<c>data1</c>
<c>data2</c>
</b>
<d>
<c>data3</c>
</d>
</a>
<XX>
...
</XX> (5 Replies)
Discussion started by: viki
5 Replies
3. Shell Programming and Scripting
How can I parse file containing xml ?
I am sure that its best to use perl - but my perl is not very good - can someone help?
Example below contents of file containing the xml - I basically want to parse the file and have each field contained in a variable..
ie. I want to store the account... (14 Replies)
Discussion started by: frustrated1
14 Replies
4. Emergency UNIX and Linux Support
Hi,
I have the following file
Example.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<html><set label="09/07/29" value="1241.90"/>
</html>
Can any one help me in parsing this xml file
I want to retrive the attribute values of the tag set
Example I want to... (3 Replies)
Discussion started by: Raji_gadam
3 Replies
5. Shell Programming and Scripting
Hello all,
Given the following extract from a xml file with multiple <JOB> .... </JOB> entries
<JOB
APPLICATION="APP"
APR="0"
AUG="0"
AUTHOR="AUT"
AUTOARCH="0"
CMDLINE="/tmp/test1 %%var"
CONFIRM="1"
CREATION_DATE="20100430"
CREATION_TIME="130739"
... (2 Replies)
Discussion started by: cabrao
2 Replies
6. Programming
How do I get the field info for tags ID, NAME, DESCRIPTION. Below is my current code put I can't get beyond the first_child of the file.
use strict;
use warnings;
use XML::Simplehttp://images.intellitxt.com/ast/adTypes/icon1.png;
use... (1 Reply)
Discussion started by: leemalloy
1 Replies
7. UNIX for Dummies Questions & Answers
HI Guys,
Input .XML
<xn:MeContext id="L0307">
<xn:ManagedElement id="1">
<xn:VsDataContainer id="1">
<xn:attributes>
<xn:vsDataType>vsDataENodeBFunction</xn:vsDataType>
... (3 Replies)
Discussion started by: pareshkp
3 Replies
8. Shell Programming and Scripting
HI Guys
I have Below XML File :
<xn:SubNetwork id="XYZ">
<xn:SubNetwork id="C01">
<xn:MeContext id="CO1">
<xn:ManagedElement id="1">
<un:RncFunction id="1">
<un:UtranCell id="NY431">
... (2 Replies)
Discussion started by: pareshkp
2 Replies
9. Shell Programming and Scripting
I am trying to create a shell script that will parse an xml file (file attached).
awk '/Id v=/ { print }' Test.xml | sed 's!<Id v=\"\(.*\)\"/>!\1!' > output.txt
An output.txt file is created but it is empty. It should contain the value 222159 in it. Thanks. (7 Replies)
Discussion started by: cmccabe
7 Replies
10. Shell Programming and Scripting
Hey guys,
I have a little problem,
Let's say I create this script :
#!/bin/sh
nfo_file="/home/admin/info.nfo"
echo "▒▒█ Hello █▒▒" > $nfo_fileIt seems to be okay :
cat /home/admin/info.nfo
▒▒█ Hello █▒▒file -bi /home/admin/info.nfo
text/plain; charset=utf-8But when I open it in a... (7 Replies)
Discussion started by: antoinelomb
7 Replies
LEARN ABOUT REDHAT
xml::um
XML::UM(3) User Contributed Perl Documentation XML::UM(3)
NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding
SYNOPSIS
use XML::UM;
# Set directory with .xml files that comes with XML::Encoding distribution
# Always include the trailing slash!
$XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/';
# Create the encoding routine
my $encode = XML::UM::get_encode (
Encoding => 'ISO-8859-2',
EncodeUnmapped => &XML::UM::encode_unmapped_dec);
# Convert a string from UTF-8 to the specified Encoding
my $encoded_str = $encode->($utf8_str);
# Remove circular references for garbage collection
XML::UM::dispose_encoding ('ISO-8859-2');
DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the
.xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does
install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in
the SYNOPSIS.
This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of
up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory!
Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.)
get_encode (Encoding => STRING, EncodeUnmapped => SUB)
The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is
defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory.
The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encod-
ing file and creates a hash that maps UTF-8 characters to encoded characters.
The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert
multi-character UTF-8 blocks to the proper encoding.
dispose_encoding ($encoding_name)
Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user
should no longer have references to the subroutines generated by get_encode().
The parameters to the get_encode() method (defined as name/value pairs) are:
o Encoding
The name of the desired encoding, e.g. 'ISO-8859-2'
o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec)
Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted
to decimal entity references, like '{'
Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '«'
CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near
%DEFAULT_ASCII_MAPPINGS.
The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files avail-
able for these encodings. This module needs some more work. If you have the time, please help!
AUTHOR
Send bug reports, hints, tips, suggestions to Enno Derksen at <enno@att.com>.
perl v5.8.0 2000-02-17 XML::UM(3)