I tried xml2 parsing, which only converts xml to a flat file format, otherwise I don't know what else to use for bash xml parsing, I've written a couple basic parsers for similar tasks, but they have bad error handling I've found. I think maybe if I could get xmlstarlet to read in extended ascii encoding for these files it would work, but I don't know how to do that.
Strings didn't seem to help either
---------- Post updated at 05:59 PM ---------- Previous update was at 05:47 PM ----------
It seems without much pain I can't get libxml2 to encode ascii extended, I'm wondering if there's a way to convert it when I read the file in from a list, which I do by:
I also know my looping probably isn't the most elegant, but it works, well, except the encoding. Is there some command I can convert the string before it gets read by xmlstarlet or something?
btw, I'm using Debian Squeeze, which uses xmlstarlet 1.0.2-1
Hello,
I have a Sun Solaris 10 installs by default in French.
I do not have CDs of the OS installation.
I have a program use the language en_US.
At connection language chosen is C (en_USxxxx not available)
I open a console $ LANG C
if LANG = en_US I get "could not set correctly local" ... (2 Replies)
I have a xmlfile like this:
<?xml version="1.0" encoding="utf-8"?>
<contentlocation xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="http://wherein.yahooapis.com/v1/schema" xml:lang="en">
<processingTime>0.001538</processingTime>
... (1 Reply)
hi,
I am using SOLARIS sparc 64 bit, during installation of Oracle i receive an error required OS locale en_US.UTF-8 does not exist on the installation computer. To avoid this issue, please ensure that the locale en_US.UTF-8 exists on the installation computer prior to installing Oracle.
when... (4 Replies)
Hi all,
I have many large files with data like following in each line:
1 822381 rs116091741 C T . PASS ASP;G5;G5A;GMAF=0.014308426073132;KGPilot123;RSPOS=822381;SAO=0;
I want output like this:
rs116091741 0.014308426073132
I tried some of the commands... (5 Replies)
I have a test file with the following format, It contains the username_date when the user was locked from the database.
$ cat lockedusers.txt
TEST1_21062016
TEST2_02122015
TEST3_01032016
TEST4_01042016
I'm writing a ksh script and faced with this difficult scenario for my... (11 Replies)
I have a XML file of little huge size. I have to build a logic to get the count of the tag <capacity>.
And have an if loop such that all the <capacity> blocks are captured one after the other.
sample input file - sample1.xml
<subcolumns><capacity><name>45.90</name>
<index>0</index>... (1 Reply)
I have multiple xml files where i want to update a subnode if the subnode project points to different project or insert a subnode if it doesn't exist using a xmlstarlet or any other command that can be used in a bash script.
I have been able to update the subnode project if it doesn't point to... (1 Reply)
Discussion started by: Sekhar419
1 Replies
LEARN ABOUT REDHAT
xml::um
XML::UM(3) User Contributed Perl Documentation XML::UM(3)NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding
SYNOPSIS
use XML::UM;
# Set directory with .xml files that comes with XML::Encoding distribution
# Always include the trailing slash!
$XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/';
# Create the encoding routine
my $encode = XML::UM::get_encode (
Encoding => 'ISO-8859-2',
EncodeUnmapped => &XML::UM::encode_unmapped_dec);
# Convert a string from UTF-8 to the specified Encoding
my $encoded_str = $encode->($utf8_str);
# Remove circular references for garbage collection
XML::UM::dispose_encoding ('ISO-8859-2');
DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the
.xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does
install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in
the SYNOPSIS.
This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of
up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory!
Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.)
get_encode (Encoding => STRING, EncodeUnmapped => SUB)
The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is
defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory.
The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encod-
ing file and creates a hash that maps UTF-8 characters to encoded characters.
The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert
multi-character UTF-8 blocks to the proper encoding.
dispose_encoding ($encoding_name)
Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user
should no longer have references to the subroutines generated by get_encode().
The parameters to the get_encode() method (defined as name/value pairs) are:
o Encoding
The name of the desired encoding, e.g. 'ISO-8859-2'
o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec)
Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted
to decimal entity references, like '{'
Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '«'
CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near
%DEFAULT_ASCII_MAPPINGS.
The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files avail-
able for these encodings. This module needs some more work. If you have the time, please help!
AUTHOR
Send bug reports, hints, tips, suggestions to Enno Derksen at <enno@att.com>.
perl v5.8.0 2000-02-17 XML::UM(3)