xmlstarlet parse non en_US characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting xmlstarlet parse non en_US characters
# 8  
Old 12-01-2010
Quote:
am I correct in assuming that utf-8 won't work for extended ASCII characters like Cyrillic, Chinese, etc?
UTF-8 can be used to represent Chinese and Cyrillic characters.

I suspect that you have what is known as a mixed encoding XML document. These are usually problematic to parse. Can you provide a pointer to an example of one of your XML documents?
# 9  
Old 12-02-2010
there's around 11K lines in each file. Really I'm only interested in one section of the xml doc, if I could get xmlstarlet to ignore the rest, it errors out elsewhere, maybe start parsing when it gets to this section:
Code:
<SECTION1>
<SECTION2 ID="1000103">
  <SECTION3>
   <SECTION4 NAME="desc1" TEXT="blah_blah">
    <SECTION5 NAME="desc2" VALUE="blah_blah" TEXT="blah_blah">
    <SECTION5 NAME="desc3" VALUE="blah_blah" TEXT="blah_blah" />
   </SECTION5>
  </SECTION4>
 </SECTION3>
 </SECTION2>
</SECTION1>

where I'm looking for the value of desc2, desc3 and others in this section. I guess I could write awk/sed to pattern match this section only then pipe to xmlstarlet, but I'm not that good at awk yet.
# 10  
Old 12-02-2010
To extract your section with sed:

Code:
sed -n '/<SECTION1>/,/<\/SECTION1>/p' mydoc.xml

This User Gave Thanks to Chubler_XL For This Post:
# 11  
Old 12-02-2010
Thanks Chubler_XL, working on modifying that, there are several sections I need to descend into to get to the right data, so trying section1/section2/section3/section4, but haven't gotten it working yet, will post if I do, was trying something like:
Code:
sed -n '/<SECTION1>/,/<\/SECTION1>/p' | sed -n '/<SECTION2>/,/<\/SECTION2>/p' | sed -n '/<SECTION3>/,/<\/SECTION3>/p'

but it's not working, will dig into it.
# 12  
Old 12-02-2010
Put the sections all in 1 sed otherwise the lines are already gone by the time your 2nd (and subsequent) seds get to it:

Code:
sed -n '/<SECTION1>/,/<\/SECTION1>/p;/<SECTION2>/,/<\/SECTION2>/p;/<SECTION3>/,/<\/SECTION3>/p'

# 13  
Old 12-02-2010
For some reason that outputs everything in the whole file... do I need some kind of nested loop?
# 14  
Old 12-02-2010
Did you forget the -n on sed?

Is the first line of the file a <SECTIONn> marker, and if not did that print?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to insert subnode in xml file using xmlstarlet or any other bash command?

I have multiple xml files where i want to update a subnode if the subnode project points to different project or insert a subnode if it doesn't exist using a xmlstarlet or any other command that can be used in a bash script. I have been able to update the subnode project if it doesn't point to... (1 Reply)
Discussion started by: Sekhar419
1 Replies

2. Shell Programming and Scripting

Use xmlstarlet inside an if loop

I have a XML file of little huge size. I have to build a logic to get the count of the tag <capacity>. And have an if loop such that all the <capacity> blocks are captured one after the other. sample input file - sample1.xml <subcolumns><capacity><name>45.90</name> <index>0</index>... (1 Reply)
Discussion started by: ramprabhum
1 Replies

3. Shell Programming and Scripting

Ksh: Read line parse characters into variable and remove the line if the date is older than 50 days

I have a test file with the following format, It contains the username_date when the user was locked from the database. $ cat lockedusers.txt TEST1_21062016 TEST2_02122015 TEST3_01032016 TEST4_01042016 I'm writing a ksh script and faced with this difficult scenario for my... (11 Replies)
Discussion started by: humble_learner
11 Replies

4. Shell Programming and Scripting

Parse two patterns and print next few characters following the pattern

Hi all, I have many large files with data like following in each line: 1 822381 rs116091741 C T . PASS ASP;G5;G5A;GMAF=0.014308426073132;KGPilot123;RSPOS=822381;SAO=0; I want output like this: rs116091741 0.014308426073132 I tried some of the commands... (5 Replies)
Discussion started by: pirates.genome
5 Replies

5. Solaris

setting locale en_US.UTF-8

hi, I am using SOLARIS sparc 64 bit, during installation of Oracle i receive an error required OS locale en_US.UTF-8 does not exist on the installation computer. To avoid this issue, please ensure that the locale en_US.UTF-8 exists on the installation computer prior to installing Oracle. when... (4 Replies)
Discussion started by: zeeshan047
4 Replies

6. Shell Programming and Scripting

xmlstarlet parse field from file

I have a xmlfile like this: <?xml version="1.0" encoding="utf-8"?> <contentlocation xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="http://wherein.yahooapis.com/v1/schema" xml:lang="en"> <processingTime>0.001538</processingTime> ... (1 Reply)
Discussion started by: unclecameron
1 Replies

7. Shell Programming and Scripting

xmlstarlet template parse small xml file

I have a file like: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <geonames> <geoname> <toponymName>Palos Verdes</toponymName> <name>Palos Verdes</name> <lat>42.1628912</lat> <lng>-123.6481235</lng> <geonameId>5718340</geonameId> <countryCode>US</countryCode>... (4 Replies)
Discussion started by: unclecameron
4 Replies

8. Solaris

Add language en_US Solaris 10

Hello, I have a Sun Solaris 10 installs by default in French. I do not have CDs of the OS installation. I have a program use the language en_US. At connection language chosen is C (en_USxxxx not available) I open a console $ LANG C if LANG = en_US I get "could not set correctly local" ... (2 Replies)
Discussion started by: XRay
2 Replies

9. Solaris

en_US.ISO8859-1 Table

Hy together, I doesn't find a table of en_US.IS08859-1. Have someone a link or same else? Thanks Urs (1 Reply)
Discussion started by: MuellerUrs
1 Replies

10. AIX

en_us.utf-8

please someone provide me the link for downloading en_us.utf-8 .....i have an issue with locale for which i need this :( (1 Reply)
Discussion started by: shubhendu.pyne
1 Replies
Login or Register to Ask a Question