Extract Element from XML file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extract Element from XML file
# 8  
Old 06-17-2015
Hi.

Using commonly available utilities (input was formatted with xmllint, q.v.):
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate extraction from XML file, xml_grep, xgrep.
# xml_grep: part of xml-twig-tools package, q.v.
# xgrep: http://wohlberg.net/public/software/xml/xgrep/

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C xgrep xml_grep

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results, xgrep:"
xgrep -t -s "Document:corspd_num/.*/" $FILE

pl " Results, xml_grep:"
xml_grep -t "corspd_num" $FILE

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
xgrep 0.08 (libxml2,pcre)
xml_grep /usr/bin/xml_grep version 0.7

-----
 Input data file data1:
<?xml version="1.0" encoding="utf-8"?>
<Data>
  <Request>
    <output_option>PCL</output_option>
    <slip_page_indicator>N</slip_page_indicator>
    <corspd_id>0003958228</corspd_id>
    <eff_dt>2015-05-20</eff_dt>
    <pgm_typ_cd>FS</pgm_typ_cd>
    <csld_num/>
    <off_id>000000</off_id>
    <actn_cd>T</actn_cd>
    <cpy_qty>01</cpy_qty>
    <Reason_Codes>
      <Reason_Code>FAS701</Reason_Code>
    </Reason_Codes>
    <Mailing_Addresses>
      <adr_cnt>01</adr_cnt>
      <Mailing_Address>
        <sort_info>954076262Elsa628CF123f3343</sort_info>
        <addresee1>Narah Ornbaun</addresee1>
        <addresee2/>
        <street_adr1>628 Elsa DR</street_adr1>
        <street_adr2/>
        <city>Santa Jose</city>
        <state>CA</state>
        <zip>95407-6262</zip>
        <postnet>*954076262289*</postnet>
        <email>N</email>
        <hrdcpy>Y</hrdcpy>
      </Mailing_Address>
    </Mailing_Addresses>
    <Document><doc_name>ALL_NOA_NO_Budget_Master_Document</doc_name><language>EN</language><corspd_num>CDS 66S-0 (4/00)</corspd_num>

</Document>
  </Request>
</Data>

-----
 Results, xgrep:
<Document><doc_name>ALL_NOA_NO_Budget_Master_Document</doc_name><language>EN</language><corspd_num>CDS 66S-0 (4/00)</corspd_num>  </Document>

-----
 Results, xml_grep:
CDS 66S-0 (4/00)

Best wishes ... cheers, drl
# 9  
Old 06-17-2015
Quote:
Originally Posted by Siva SQL
Don Cragun,

It would be much helpful, if you can explain the command as what it does in detail. Since being newbie I finding it difficult to understand the commands used.
Since, most UNIX text processing utilities on some systems have line length limitations of approximately 2048 bytes per line (and you have told us that your system has a limit of 3000 bytes per line in sed), we have to assume that you can't process this input file in the normal ways. Although, it is very confusing for you to give us an 881 byte file and tell us that utilities are complaining because it contains a line that is longer than 3000 bytes???

We can get around awk's line length limitations by setting a record separator that will select records that are shorter than its line length limit. With your sample xml file we can use the > as the record separator and get lots of short records instead of a single 3000+ byte long line.

With the data you're trying to capture being of the form:
Code:
...<corspd_num>data</corspd_num>...

the text shown in red will be in previous records or following records. By setting the field separator to the string </corspd_num, any record that has two fields with have the data that you want as the 1st field in that record and an empty field as the 2nd field on that line. Any record that does not contain the field separator string will either have zero fields or one field. So the code:
Code:
awk -F'</corspd_num' -v RS='>' 'NF > 1{print $1}' DRIVER_TagString

  1. invokes awk
  2. with the input field separator set to </corspd_num (specified by -F'</corspd_num'),
  3. the input record separator set to > (specified by -v RS='>'),
  4. the script to be evaluated set to 'NF > 1{print $1}', and
  5. the file to be processed specified by DRIVER_TagString.
And the script NF > 1{print $1} says that for each record found in the input file, if the number of fields found in that record (NF) is more than 1 (NF > 1) then print (print) the 1st field from that record ($1). Other records in the file are skipped over without printing anything.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract a value from an xml file

I have this XML file format and all in one line: Fri Dec 23 00:14:52 2016 Logged Message:689|<?xml version="1.0" encoding="UTF-8"?><PORT_RESPONSE><HEADER><ORIGINATOR>XMG</ORIGINATOR><DESTINAT... (16 Replies)
Discussion started by: mrn6430
16 Replies

2. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

3. Shell Programming and Scripting

Find if XML element has a matching required element

I want to check if every <Part> element has corresponding <Description> in this sample XML. ....<Lot Of XML> <Inv lineNumber="2"> <Item> ... (4 Replies)
Discussion started by: kchinnam
4 Replies

4. Shell Programming and Scripting

Need to find root element name of XML file

Given this XML: <?xml version="1.0"?> <catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> <cd> <title>Hide your heart</title> ... (2 Replies)
Discussion started by: ricksj
2 Replies

5. Shell Programming and Scripting

Extracting the Root Element from the XML File

Any help to extract the root element from an XML file will be appreciated. Example: test.xml <?xml version="1.0" encoding="utf-8" ?> <TestXMLMessage> <TestRec> <ID>1000</ID> </TestRec> </TestXMLMessage> Wanted to extract the TestXMLMessage. Regards, Chari (6 Replies)
Discussion started by: sree_chari
6 Replies

6. UNIX for Dummies Questions & Answers

Extract Field Value from XML file

Hi, Within a UNIX shell script I need to extract a value from an XML field. The field will contain different values but will always be 6 digits in length. E.g.: <provider-id>999999</provider-id> I've tried various ways but no luck. Any ideas how I might get the provider id (in this case... (2 Replies)
Discussion started by: pnclayt11
2 Replies

7. UNIX Desktop Questions & Answers

read XML xml element with REGEXP

Hi, I would need to read an xml element from an xml file to a local variable. Please could you help me with a shell script to get so? Considering that I have a file called file.xml like below: <header> <description>This is the description</description> <content>This is the... (2 Replies)
Discussion started by: oscarmon
2 Replies

8. Shell Programming and Scripting

Extract XML Element Values

I have a rather large file with XML-style content. Each line contains one full XML entry. For example: 1:<Message><DNIS>1234</DNIS><UCID>3456</UCID><TransferGroup>XYZXYZ</TransferGroup></Message> 2:<Message><DNIS>9999</DNIS><UCID>2584</UCID><TransferGroup>ABCABC</TransferGroup></Message>... (1 Reply)
Discussion started by: sharpi03
1 Replies

9. Shell Programming and Scripting

Finding a XML element and moving the file

Hi All, I am looking for a awk/shell which can find an element named REFERENCE in a XML file and check whether it is empty or not. If there is no value in the REFERENCE element then correspondingly move the file to some other folder. The Unix server is AIX version 4. Any inputs... (9 Replies)
Discussion started by: karansachdeva
9 Replies

10. Shell Programming and Scripting

How to extract text from xml file

I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost, course number, description, and meeting information. <?xml version="1.0" encoding="UTF-16" standalone="yes" ?> - <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312"... (3 Replies)
Discussion started by: chrisf
3 Replies
Login or Register to Ask a Question