Extract a value from an xml file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract a value from an xml file
# 8  
Old 12-23-2016
Quote:
...xmllint...
c00l

That works too:

Code:
xmllint file.xml --xpath '//PORTED_NUM/text()'

It was there even without installing as it is part of the basic libxml2-utils package here. It too seems that xmllint has a more complete degree of conformance to the XPATH-Specification than the other 2 tools.

Last edited by stomp; 12-23-2016 at 07:14 PM..
This User Gave Thanks to stomp For This Post:
# 9  
Old 12-23-2016
Hi,

Code:
xmllint file.xml --xpath '//PORTED_NUM/text()'

More simple Smilie

However when i run, i get
Quote:
Unknown option --xpath
Code:
xmllint --version
xmllint: using libxml version 20706

# 10  
Old 12-23-2016
Code:
$ cat /etc/issue.net 
Debian GNU/Linux 8

xmllint --version
xmllint: using libxml version 20901
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML \
   Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr \ 
   Schemas Schematron Modules Debug Zlib Lzma

Seems to be not the version, but just the choice of features at compile time:
Code:
$ cat /etc/issue.net 
Debian GNU/Linux 3.1

$ xmllint --version
xmllint: using libxml version 20616
   compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude Iconv Unicode Regexps Automata Schemas

Hmm. Wrong. --xpath juest does not work with the old version.

However this one even works within the ancient debian:

Code:
 xmllint --shell file.xml <<<'cat //PORTED_NUM'

Resulting in this:
Code:
/ >  -------
<PORTED_NUM>990-799-1234</PORTED_NUM>
/ >

Now strip of the unnecessary junk around the value:
Code:
xmllint --shell file.xml <<<'cat //PORTED_NUM' | grep -oE '[0-9-]{8,}'

...with the following being a bit more general(using positive lookahead and lookbehind to require > berfore the pattern and < after)...
Code:
xmllint --shell file.xml <<<'cat //PORTED_NUM' | perl -ne '/(?<=>)(.*)(?=<)/ and print($1)'


Last edited by stomp; 12-24-2016 at 07:11 AM..
# 11  
Old 12-24-2016
Hi.

Gathering these suggestions together with the call to the older version of xmllint (posted by stomp) and adding xml2 to process the modified XML (posted by greet_sed):
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate string extraction from XML file.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C xml_grep xmlstarlet xmllint xml2

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE, $(wc -l <$FILE) lines:"
cat $FILE

pl " Expected output:"
cat $E

pl " Results, xml_grep:"
xml_grep //PORTED_NUM --text_only $FILE

pl " Results, xmlstarlet:"
xmlstarlet sel -t -v //PORTED_NUM $FILE
pe

pl " Results, xmllint:"
xmllint $FILE --xpath '//PORTED_NUM/text()'
pe

pl " Results, xmllint:"
xmllint --shell $FILE <<<'cat //PORTED_NUM' |
perl -ne '/(?<=>)(.*)(?=<)/ and print($1)'
pe

pl " Results, xml2:"
xml2 < data1 |
awk -F= '/PORTED_NUM/ { print $2 }'

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
xml_grep /usr/bin/xml_grep version 0.9
xmlstarlet - ( /usr/bin/xmlstarlet, 2014-09-14 )
xmllint: using libxml version 20901
xml2 - ( /usr/bin/xml2, 2012-04-16 )

-----
 Input data file data1, 1 lines:
<?xml version="1.0" encoding="UTF-8"?><PORT_RESPONSE><HEADER><ORIGINATOR>XMG</ORIGINATOR><DESTINATION>ENSEMBLE</DESTINATION><MESSAGE_ID>NXT107349698</MESSAGE_ID><MSGTYPE>PRI</MSGTYPE><TIMESTAMP>12232016061452</TIMESTAMP></HEADER><ADMIN><WICIS_REL_NO>5.0.0</WICIS_REL_NO><NNSP>9664</NNSP><OLSP>6529</OLSP><ONSP>6529</ONSP><REQ_NO>6664016358514349</REQ_NO><VER_ID_REQ>00</VER_ID_REQ><VER_ID_RESP>00</VER_ID_RESP><RT>C</RT><RESP_NO>652901635838480144</RESP_NO><CD_TSENT>122220160614</CD_TSENT><REP>Port Center</REP><TEL_NO_REP>000-207-8009</TEL_NO_REP><CHC></CHC><DD_T>122320160909</DD_T><NPQTY>00001</NPQTY></ADMIN><LINE_DATA><PORTED_NUM>990-799-1234</PORTED_NUM></LINE_DATA></PORT_RESPONSE> 

-----
 Expected output:
990-799-1234

-----
 Results, xml_grep:
990-799-1234

-----
 Results, xmlstarlet:
990-799-1234

-----
 Results, xmllint:
990-799-1234

-----
 Results, xmllint:
990-799-1234

-----
 Results, xml2:
990-799-1234

Observations:
1) the code for the early xmllint and xml2 need additional work, perl, awk, grep, etc. to isolate the string of interest.

2) A few of the codes (the ones that have a pe afterwards, seem to omit the trailing newline -- not an error, just something to be noted.

Details for xml2:
Code:
xml2    convert xml documents in a flat format (man)
Path    : /usr/bin/xml2
Version : - ( /usr/bin/xml2, 2012-04-16 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.6 (jessie)

Best wishes ... cheers, drl

Last edited by drl; 12-24-2016 at 09:41 PM..
# 12  
Old 12-25-2016
Another awk:
Code:
awk -v k=PORTED_NUM '$1==k{print $2}' RS=\< FS=\> file




--
On Solaris use /usr/xpg4/bin/awk rather than awk
# 13  
Old 12-30-2016
All worked. Thanks to all
# 14  
Old 12-31-2016
In this forum have been lot of questions parse xml using awk.

I have published some solution in one location. Sort toolset to parse xml using awk. Basic idea is to change xml more for awk friendly format.

Enjoy.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract Element from XML file

<?xml version = '1.0' encoding =... (8 Replies)
Discussion started by: Siva SQL
8 Replies

2. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

3. Shell Programming and Scripting

Extract data from XML file

Hi , I have input file as XML. following are input data #complex.xml <?xml version="1.0" encoding="UTF-8"?> <TEST_doc xmlns="http://www.w3.org/2001/XMLSchema-instance"> <ENTRY uid="123456"> <protein> <name>PROT001</name> <organism>Human</organism> ... (1 Reply)
Discussion started by: mohan sharma
1 Replies

4. Shell Programming and Scripting

Extract XML tag value from file

Hello, Hope you are doing fine. I have an log file which looks like as follows: Some junk text1 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: Some jun text 2 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: ... (3 Replies)
Discussion started by: srattani
3 Replies

5. Shell Programming and Scripting

Extract values from an XML File

Hi, I need to capture all the attributes with delete next to it. The source XML file is attached. The output should contain something like this below: Attributes = legacyExchangeDN Action = Delete Username = Hero Joker Loginid = joker09 OU =... (4 Replies)
Discussion started by: prvnrk
4 Replies

6. Shell Programming and Scripting

extract a pattern from a xml file

Hello All, I want to write a shell script for extracting a content from a xml file the xml file looks like this: <Variable name="moreAxleInfo"> <type> <Table> <type> <NamedType> <type> <TypeRef... (11 Replies)
Discussion started by: suvendu4urs
11 Replies

7. Shell Programming and Scripting

Extract XML content from a file

310439 2012-01-11 03:44:42,291 INFO PutServlet:? - Content of the Message is:="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED> 310440 <BATCH_ID>12345678519</BATCH_ID> 310441 <UID>3498748823</UID> 310442 <FEED_TYPE>FULL</FEED_TYPE> 310443 <MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME> 310444... (11 Replies)
Discussion started by: arukuku
11 Replies

8. Shell Programming and Scripting

Extract details from XML file

Hi , I have one xml file contains more than 60 lines. I need to extract some details from the file and store it in new file.Not the whole file Please find the xml file below: <?xml version="1.0" encoding="UTF-8"?> <DeploymentDescriptors xmlns="http://www.tibco.com/xmlns/dd"> ... (6 Replies)
Discussion started by: ckchelladurai
6 Replies

9. UNIX for Dummies Questions & Answers

Extract Field Value from XML file

Hi, Within a UNIX shell script I need to extract a value from an XML field. The field will contain different values but will always be 6 digits in length. E.g.: <provider-id>999999</provider-id> I've tried various ways but no luck. Any ideas how I might get the provider id (in this case... (2 Replies)
Discussion started by: pnclayt11
2 Replies

10. Shell Programming and Scripting

How to extract text from xml file

I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost, course number, description, and meeting information. <?xml version="1.0" encoding="UTF-16" standalone="yes" ?> - <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312"... (3 Replies)
Discussion started by: chrisf
3 Replies
Login or Register to Ask a Question