Sponsored Content
Top Forums Shell Programming and Scripting xmlstarlet parse non en_US characters Post 302476102 by unclecameron on Tuesday 30th of November 2010 08:59:55 PM
Old 11-30-2010
I tried xml2 parsing, which only converts xml to a flat file format, otherwise I don't know what else to use for bash xml parsing, I've written a couple basic parsers for similar tasks, but they have bad error handling I've found. I think maybe if I could get xmlstarlet to read in extended ascii encoding for these files it would work, but I don't know how to do that.

Strings didn't seem to help either

---------- Post updated at 05:59 PM ---------- Previous update was at 05:47 PM ----------

It seems without much pain I can't get libxml2 to encode ascii extended, I'm wondering if there's a way to convert it when I read the file in from a list, which I do by:
Code:
cat "${@:-somelist.txt}" |
while read i
do
        strings $i | xmlstarlet sel -t -m "//sec1/sec2/sec3/sec4/sec5" -v "@VALUE" -n > somefile
   value1=`sed -n "1p" somefile`
...

I also know my looping probably isn't the most elegant, but it works, well, except the encoding. Is there some command I can convert the string before it gets read by xmlstarlet or something?

btw, I'm using Debian Squeeze, which uses xmlstarlet 1.0.2-1
 

10 More Discussions You Might Find Interesting

1. AIX

en_us.utf-8

please someone provide me the link for downloading en_us.utf-8 .....i have an issue with locale for which i need this :( (1 Reply)
Discussion started by: shubhendu.pyne
1 Replies

2. Solaris

en_US.ISO8859-1 Table

Hy together, I doesn't find a table of en_US.IS08859-1. Have someone a link or same else? Thanks Urs (1 Reply)
Discussion started by: MuellerUrs
1 Replies

3. Solaris

Add language en_US Solaris 10

Hello, I have a Sun Solaris 10 installs by default in French. I do not have CDs of the OS installation. I have a program use the language en_US. At connection language chosen is C (en_USxxxx not available) I open a console $ LANG C if LANG = en_US I get "could not set correctly local" ... (2 Replies)
Discussion started by: XRay
2 Replies

4. Shell Programming and Scripting

xmlstarlet template parse small xml file

I have a file like: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <geonames> <geoname> <toponymName>Palos Verdes</toponymName> <name>Palos Verdes</name> <lat>42.1628912</lat> <lng>-123.6481235</lng> <geonameId>5718340</geonameId> <countryCode>US</countryCode>... (4 Replies)
Discussion started by: unclecameron
4 Replies

5. Shell Programming and Scripting

xmlstarlet parse field from file

I have a xmlfile like this: <?xml version="1.0" encoding="utf-8"?> <contentlocation xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="http://wherein.yahooapis.com/v1/schema" xml:lang="en"> <processingTime>0.001538</processingTime> ... (1 Reply)
Discussion started by: unclecameron
1 Replies

6. Solaris

setting locale en_US.UTF-8

hi, I am using SOLARIS sparc 64 bit, during installation of Oracle i receive an error required OS locale en_US.UTF-8 does not exist on the installation computer. To avoid this issue, please ensure that the locale en_US.UTF-8 exists on the installation computer prior to installing Oracle. when... (4 Replies)
Discussion started by: zeeshan047
4 Replies

7. Shell Programming and Scripting

Parse two patterns and print next few characters following the pattern

Hi all, I have many large files with data like following in each line: 1 822381 rs116091741 C T . PASS ASP;G5;G5A;GMAF=0.014308426073132;KGPilot123;RSPOS=822381;SAO=0; I want output like this: rs116091741 0.014308426073132 I tried some of the commands... (5 Replies)
Discussion started by: pirates.genome
5 Replies

8. Shell Programming and Scripting

Ksh: Read line parse characters into variable and remove the line if the date is older than 50 days

I have a test file with the following format, It contains the username_date when the user was locked from the database. $ cat lockedusers.txt TEST1_21062016 TEST2_02122015 TEST3_01032016 TEST4_01042016 I'm writing a ksh script and faced with this difficult scenario for my... (11 Replies)
Discussion started by: humble_learner
11 Replies

9. Shell Programming and Scripting

Use xmlstarlet inside an if loop

I have a XML file of little huge size. I have to build a logic to get the count of the tag <capacity>. And have an if loop such that all the <capacity> blocks are captured one after the other. sample input file - sample1.xml <subcolumns><capacity><name>45.90</name> <index>0</index>... (1 Reply)
Discussion started by: ramprabhum
1 Replies

10. UNIX for Beginners Questions & Answers

How to insert subnode in xml file using xmlstarlet or any other bash command?

I have multiple xml files where i want to update a subnode if the subnode project points to different project or insert a subnode if it doesn't exist using a xmlstarlet or any other command that can be used in a bash script. I have been able to update the subnode project if it doesn't point to... (1 Reply)
Discussion started by: Sekhar419
1 Replies
XML::UM(3)						User Contributed Perl Documentation						XML::UM(3)

NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding SYNOPSIS
use XML::UM; # Set directory with .xml files that comes with XML::Encoding distribution # Always include the trailing slash! $XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/'; # Create the encoding routine my $encode = XML::UM::get_encode ( Encoding => 'ISO-8859-2', EncodeUnmapped => &XML::UM::encode_unmapped_dec); # Convert a string from UTF-8 to the specified Encoding my $encoded_str = $encode->($utf8_str); # Remove circular references for garbage collection XML::UM::dispose_encoding ('ISO-8859-2'); DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the .xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in the SYNOPSIS. This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory! Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.) get_encode (Encoding => STRING, EncodeUnmapped => SUB) The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory. The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encod- ing file and creates a hash that maps UTF-8 characters to encoded characters. The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert multi-character UTF-8 blocks to the proper encoding. dispose_encoding ($encoding_name) Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user should no longer have references to the subroutines generated by get_encode(). The parameters to the get_encode() method (defined as name/value pairs) are: o Encoding The name of the desired encoding, e.g. 'ISO-8859-2' o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec) Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted to decimal entity references, like '&#123;' Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '&#xAB;' CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near %DEFAULT_ASCII_MAPPINGS. The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files avail- able for these encodings. This module needs some more work. If you have the time, please help! AUTHOR
Send bug reports, hints, tips, suggestions to Enno Derksen at <enno@att.com>. perl v5.8.0 2000-02-17 XML::UM(3)
All times are GMT -4. The time now is 02:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy