Sponsored Content
Top Forums Shell Programming and Scripting xmlstarlet parse non en_US characters Post 302476090 by unclecameron on Tuesday 30th of November 2010 07:54:44 PM
Old 11-30-2010
xmlstarlet parse non en_US characters

I'm parsing around 600K xml files, with roughly 1500 lines of text in each, some of the lines include Chinese, Russian, whatever language, with a bash script that uses
Code:
 cat $i | xmlstarlet sel -t -m "//section1/section2/section3/section4/section5" -v "@VALUE" -n > somefile

which works, but I get parse errors like
Code:
-:2350: parser error : invalid character in attribute value
     <NODE NAME="something" VALUE="^Tï¿<96>ᄂ←ᄍ￰ï¿<8f>￲/dPï¾Â*ï¿<84>ï¿<88>mu" />
                                  ^
-:2350: parser error : attributes construct error
     <NODE NAME="something" VALUE="^Tï¿<96>ᄂ←ᄍ￰ï¿<8f>￲/dPï¾Â*ï¿<84>ï¿<88>mu" />
                                  ^
-:2350: parser error : Couldn't find end of Start Tag NODE line 2350
     <NODE NAME="something" VALUE="^Tï¿<96>ᄂ←ᄍ￰ï¿<8f>￲/dPï¾Â*ï¿<84>ï¿<88>mu" />
                                  ^
-:2350: parser error : PCDATA invalid Char value 20
     <NODE NAME="something" VALUE="^Tï¿<96>ᄂ←ᄍ￰ï¿<8f>￲/dPï¾Â*ï¿<84>ï¿<88>mu" />

I have installed all locales. Is there a way to bulk change all the encoding to UTF-8 or something on all the files, or install something, or am I going about it the wrong way?
 

10 More Discussions You Might Find Interesting

1. AIX

en_us.utf-8

please someone provide me the link for downloading en_us.utf-8 .....i have an issue with locale for which i need this :( (1 Reply)
Discussion started by: shubhendu.pyne
1 Replies

2. Solaris

en_US.ISO8859-1 Table

Hy together, I doesn't find a table of en_US.IS08859-1. Have someone a link or same else? Thanks Urs (1 Reply)
Discussion started by: MuellerUrs
1 Replies

3. Solaris

Add language en_US Solaris 10

Hello, I have a Sun Solaris 10 installs by default in French. I do not have CDs of the OS installation. I have a program use the language en_US. At connection language chosen is C (en_USxxxx not available) I open a console $ LANG C if LANG = en_US I get "could not set correctly local" ... (2 Replies)
Discussion started by: XRay
2 Replies

4. Shell Programming and Scripting

xmlstarlet template parse small xml file

I have a file like: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <geonames> <geoname> <toponymName>Palos Verdes</toponymName> <name>Palos Verdes</name> <lat>42.1628912</lat> <lng>-123.6481235</lng> <geonameId>5718340</geonameId> <countryCode>US</countryCode>... (4 Replies)
Discussion started by: unclecameron
4 Replies

5. Shell Programming and Scripting

xmlstarlet parse field from file

I have a xmlfile like this: <?xml version="1.0" encoding="utf-8"?> <contentlocation xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="http://wherein.yahooapis.com/v1/schema" xml:lang="en"> <processingTime>0.001538</processingTime> ... (1 Reply)
Discussion started by: unclecameron
1 Replies

6. Solaris

setting locale en_US.UTF-8

hi, I am using SOLARIS sparc 64 bit, during installation of Oracle i receive an error required OS locale en_US.UTF-8 does not exist on the installation computer. To avoid this issue, please ensure that the locale en_US.UTF-8 exists on the installation computer prior to installing Oracle. when... (4 Replies)
Discussion started by: zeeshan047
4 Replies

7. Shell Programming and Scripting

Parse two patterns and print next few characters following the pattern

Hi all, I have many large files with data like following in each line: 1 822381 rs116091741 C T . PASS ASP;G5;G5A;GMAF=0.014308426073132;KGPilot123;RSPOS=822381;SAO=0; I want output like this: rs116091741 0.014308426073132 I tried some of the commands... (5 Replies)
Discussion started by: pirates.genome
5 Replies

8. Shell Programming and Scripting

Ksh: Read line parse characters into variable and remove the line if the date is older than 50 days

I have a test file with the following format, It contains the username_date when the user was locked from the database. $ cat lockedusers.txt TEST1_21062016 TEST2_02122015 TEST3_01032016 TEST4_01042016 I'm writing a ksh script and faced with this difficult scenario for my... (11 Replies)
Discussion started by: humble_learner
11 Replies

9. Shell Programming and Scripting

Use xmlstarlet inside an if loop

I have a XML file of little huge size. I have to build a logic to get the count of the tag <capacity>. And have an if loop such that all the <capacity> blocks are captured one after the other. sample input file - sample1.xml <subcolumns><capacity><name>45.90</name> <index>0</index>... (1 Reply)
Discussion started by: ramprabhum
1 Replies

10. UNIX for Beginners Questions & Answers

How to insert subnode in xml file using xmlstarlet or any other bash command?

I have multiple xml files where i want to update a subnode if the subnode project points to different project or insert a subnode if it doesn't exist using a xmlstarlet or any other command that can be used in a bash script. I have been able to update the subnode project if it doesn't point to... (1 Reply)
Discussion started by: Sekhar419
1 Replies
XML_SET_OBJECT(3)							 1							 XML_SET_OBJECT(3)

xml_set_object - Use XML Parser within an object

SYNOPSIS
bool xml_set_object (resource $parser, object &$object) DESCRIPTION
This function allows to use $parser inside $object. All callback functions could be set with xml_set_element_handler(3) etc and assumed to be methods of $object. PARAMETERS
o $parser - A reference to the XML parser to use inside the object. o $object - The object where to use the XML parser. RETURN VALUES
Returns TRUE on success or FALSE on failure. EXAMPLES
Example #1 xml_set_object(3) example <?php class xml { var $parser; function xml() { $this->parser = xml_parser_create(); xml_set_object($this->parser, $this); xml_set_element_handler($this->parser, "tag_open", "tag_close"); xml_set_character_data_handler($this->parser, "cdata"); } function parse($data) { xml_parse($this->parser, $data); } function tag_open($parser, $tag, $attributes) { var_dump($parser, $tag, $attributes); } function cdata($parser, $cdata) { var_dump($parser, $cdata); } function tag_close($parser, $tag) { var_dump($parser, $tag); } } // end of class xml $xml_parser = new xml(); $xml_parser->parse("<A ID='hallo'>PHP</A>"); ?> PHP Documentation Group XML_SET_OBJECT(3)
All times are GMT -4. The time now is 10:17 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy