Extract values from xml file script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract values from xml file script
# 1  
Old 07-01-2018
Extract values from xml file script

Hi, please help on this. I want extract values of xml file structure and print in determined way.

<ProjectName> --> only appears once
<StructList> --> is the top node
<Struct> node --> could be more than 1
NameID, STX, STY, PRX, PRY --> appears only 1 time within each <Struct> node
<PR_Ranges> node --> only appears once but inside this node could be more than 1 <RangesInfo>
I want to extract children (OD, ODF, ODRangeStart and ODRangeStop) from each <RangesInfo>

I want to print the values for each <Struct> node in a single line with this format

Code:
ProjectName|NameID|STX-STY|PRX-PRY|OD-ODF|ODRangeStart|ODRangeStop

My input xml, current awk code and current output that is wrong is below

Code:
echo "<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ProjectInfo>
<ProjectName>HY-LKL</ProjectName>
<StructList>
    <Struct>
    <StructData>    
    <NameID>ROPSL</NameID>    
            <STR_VAL>
            <STX>210</STX>
            <STY>21</STY>
            </STR_VAL>
            <PRO_VAL>
            <PRX>62</PRX>
            <PRY>822</PRY>
            </PRO_VAL>
            <PR_Ranges>
                <RangesInfo>
                <ValueRange>
                    <OD>22</OD>
                    <ODF>3199</ODF>
                </ValueRange>
                </RangesInfo>
                <RangesInfo>
                <ValueRange>
                    <OD>22</OD>
                    <ODF>023</ODF>
                    <ODRange>
                    <ODRangeStart>00</ODRangeStart>
                    <ODRangeStop>99</ODRangeStop>
                    </ODRange>
                </ValueRange>
                </RangesInfo>
            </PR_Ranges>      
    </StructData>
    </Struct>  
    <Struct>
    <StructData>
    <NameID>MACLS</NameID>      
            <STR_VAL>
            <STX>210</STX>
            <STY>01</STY>
            </STR_VAL>
            <PRO_VAL>
            <PRX>62</PRX>
            <PRY>816</PRY>
            </PRO_VAL>
            <PR_Ranges>
                <RangesInfo>
                <ValueRange>
                    <OD>22</OD>
                    <ODF>010</ODF>
                    <ODRange>
                    <ODRangeStart>00</ODRangeStart>
                    <ODRangeStop>99</ODRangeStop>
                    </ODRange>
                </ValueRange>
                </RangesInfo>
            </PR_Ranges>              
    </StructData>
    </Struct>
</StructList>   
</ProjectInfo>" | 

awk -F"<|>" '
BEGIN{print "ProjectName|NameID|STX-STY|PRX-PRY|OD-ODF|ODRangeStart|ODRangeStop"}
/ProjectName/{printf "%s",$3}
/NameID/{id=$3}
/STX/{stx=$3}
/STY/{sty=$3}
/PRX/{prx=$3}
/PRY/{pry=$3}
/OD/ {od=$3}
/ODF/{odf=$3}
/ODRangeStart/{rngStart=$3}
/ODRangeStop/ {rngStop=$3
printf "|%s|%s-%s|%s-%s|%s-%s|%s|%s\n",id,stx,sty,prx,pry,od,odf,rngStart,rngStop
stx=sty=prx=pry=od=odf=rngStart=rngStop=""
}
'

My current output (not desired output)
Code:
ProjectName|NameID|STX-STY|PRX-PRY|OD-ODF|ODRangeStart|ODRangeStop
HY-LKL|ROPSL|210-21|62-822|99-023|00|99
|MACLS|210-01|62-816|99-010|00|99


My desired output
Code:
ProjectName|NameID|STX-STY|PRX-PRY|OD-ODF|ODRangeStart|ODRangeStop
HY-LKL|ROPSL|210-21|62-822|22-3199||
||||22-023|00|99
|MACLS|210-01|62-816|22-010|00|99

Thanks in advance for any help.
# 2  
Old 07-01-2018
Hi,

I suggest to use an XML-Tool for parsing an XML-File. Look at the thread here for some tools:

Extract a value from an xml file

XPath ist name of the Search-Syntax, you can use to find values:

Some Examples:

Get the Projectname
Code:
xmllint --xpath "//ProjectName/text()" file.xml

Get all NameIDs
Code:
xmllint --xpath "//NameID/text()" file.xml

Get STX for a section NameID "MACLS"
Code:
xmllint --xpath "//*[NameID[text()='MACLS']]/STR_VAL/STX/text()"

That are a lot of xmllint calls. I myself would use a scripting language that has xml as library. But Bash with xmllint should be possible to albeit not so fast.
This User Gave Thanks to stomp For This Post:
# 3  
Old 07-01-2018
This appears to work OK for you sample input:

Code:
awk -F"<|>" '
BEGIN{print "ProjectName|NameID|STX-STY|PRX-PRY|OD-ODF|ODRangeStart|ODRangeStop"}
/ProjectName/{printf "%s",$3}
/NameID/{id=$3}
/<STX>/{stx=$3}
/<STY>/{sty=$3}
/<PRX>/{prx=$3}
/<PRY>/{pry=$3}
/<OD>/ {od=$3}
/ODF>/{odf=$3}
/ODRangeStart/{rngStart=$3}
/ODRangeStop/ {rngStop=$3}
/<.ValueRange/ {
printf "|%s|%s|%s|%s|%s|%s\n",id,stx?stx"-"sty:"",prx?prx"-"pry:"",od?od"-"odf:"",rngStart,rngStop
id=stx=sty=prx=pry=od=odf=rngStart=rngStop=""
}
'

# 4  
Old 07-02-2018
Quote:
Originally Posted by stomp
Hi,

I suggest to use an XML-Tool for parsing an XML-File. Look at the thread here for some tools:

That are a lot of xmllint calls. I myself would use a scripting language that has xml as library. But Bash with xmllint should be possible to albeit not so fast.
Hi stomp, thanks for your answer and suggestion. I'll have in mind this xml tool, but for now I think I close to get the output desired with awk.

---------- Post updated at 11:05 PM ---------- Previous update was at 11:04 PM ----------

Quote:
Originally Posted by Chubler_XL
This appears to work OK for you sample input:
Hi Chubler_XL,

Thanks. It works, but trying with a real input xml it prints a kind of different output, this was mmy fault
since in order to make the sample input shorter I missed some nodes.

Below I present a more representative sample file.
- The nodes STR_VAL and PRO_VAL are the same structure, the issue is exist a parent node called <GROUP_Ranges> that contains
the children <XB_Ranges>, <PR_Ranges> and <KJ_Ranges>. Each one of this children have the same sub-children named OD, ODF, ODRange, etc.
The output desired remains the same, I only want to extract the sub-children of <PR_Ranges>, since my first sample file was less representative, your
current solution is printing the sub-children of <XB_Ranges> and sub-children of <KJ_Ranges>. In addition, the values of STX, STY, PRX, and PRY are not being
printed when input file is like this second sample file.

May be you can help me to fix this, how to print the same output as before but considering only values from PR_Ranges.

The input file 2 is:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ProjectInfo>
<ProjectName>ABDFC</ProjectName>
<StructList>
    <Struct>
    <StructData>    
    <NameID>ROPSL</NameID>  
            <GROUP_Ranges>
              <XB_Ranges>
                <RangesInfo>
                  <ValueRange>
                    <OD>534</OD>
                    <ODF>91</ODF>
                    <ODRange>
                      <ODRangeStart>00</ODRangeStart>
                      <ODRangeStop>99</ODRangeStop>
                    </ODRange>
                  </ValueRange>
                </RangesInfo>
              </XB_Ranges>
              <PR_Ranges>
                <RangesInfo>
                  <ValueRange>
                    <OD>534</OD>
                    <ODF>91</ODF>
                    <ODRange>
                      <ODRangeStart>56</ODRangeStart>
                      <ODRangeStop>879</ODRangeStop>
                    </ODRange>
                  </ValueRange>
                </RangesInfo>
                <RangesInfo>
                <ValueRange>
                    <OD>92</OD>
                    <ODF>21</ODF>
                    <ODRange>
                    <ODRangeStart>100</ODRangeStart>
                    <ODRangeStop>299</ODRangeStop>
                    </ODRange>
                </ValueRange>
                </RangesInfo>				
              </PR_Ranges>
              <KJ_Ranges>
                <ValueRange>
                  <OD>534</OD>
                  <ODF>91</ODF>
                  <ODRange>
                    <ODRangeStart>440</ODRangeStart>
                    <ODRangeStop>449</ODRangeStop>
                  </ODRange>
                </ValueRange>
              </KJ_Ranges>
            </GROUP_Ranges>
            <STR_VAL>
              <STX>283</STX>
              <STY>84</STY>
            </STR_VAL>
            <PRO_VAL>
              <PRX>534</PRX>
              <PRY>91</PRY>
            </PRO_VAL>	     
    </StructData>
    </Struct>  
</StructList>   
</ProjectInfo>

The output for this input file 2 would be like this:
Code:
ProjectName|NameID|STX-STY|PRX-PRY|OD-ODF|ODRangeStart|ODRangeStop
ABDFC|ROPSL|283-84|534-91|534-91|56|879
||||92-21|100|299

Thanks
# 5  
Old 07-02-2018
A little confused with this. The <STR_VAL> block for the first output line appears after the <ODRange> block for the 2nd output line.

How do you match up STX-STY values in the XLM with their appropriate output lines?
# 6  
Old 07-02-2018
Quote:
Originally Posted by Chubler_XL
A little confused with this. The <STR_VAL> block for the first output line appears after the <ODRange> block for the 2nd output line.

How do you match up STX-STY values in the XLM with their appropriate output lines?
Actually <STR_VAL> and <PRO_VAL> goes after <ValueRange>/<ODRange>. In first sample STR_VAL appears before for the same reason as explain that was a not too representative sample. Smilie


All the values related witn each <NameID>, STX-STY and each PRX-PRY are inside each <Struct> node. I donīt know if I answer your doubts.
# 7  
Old 07-02-2018
Quote:
Originally Posted by Ophiuchus
Actually <STR_VAL> and <PRO_VAL> goes after <ValueRange>/<ODRange>. In first sample STR_VAL appears before for the same reason as explain that was a not too representative sample. Smilie


All the values related witn each <NameID>, STX-STY and each PRX-PRY are inside each <Struct> node. I donīt know if I answer your doubts.
No. You have not answered our doubts. You gave us sample input in post #1 and you showed us the output you wanted from that input. And, you were given code that produced that output from that input and then you changed your requirements.

In post 4 you gave us new sample input and you said "The output for this input file 2 would be like this:" and you showed us some output. But with that wording, I don't know if you are saying that that is the output you get from some code that has been suggested (but not what you want), that it is the output you get from some other code that you're using (but not what you want), or if it is the output you want from that new input.

Furthermore, you haven't clearly specified whether the original input you provided in post #1 was valid input that did not include data that was needed to trigger special cases that were missing from your original algorithm (and the code you want should still provide the output you said you want from that input in post #1) or if the input you provided in post #1 was not valid input and everything you said about the output you wanted to be produced from that input should be ignored.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract xml attribute values using awk inline.?

I am trying to extract specific XML attribute values for search pattern <factories.*baseQueueName' from resources.xml. my scripts works ok,, but to extract 3 values this code does echo $line three times, it could be 'n' times. How can I use awk to extract matching pattern values in-line or... (11 Replies)
Discussion started by: kchinnam
11 Replies

2. UNIX for Dummies Questions & Answers

Reading XML file and print the values in the text file using Linux shell script

hi guys, i want help... Reding XML file and print the values into the text file using linux shell script file as per below xml file <sequence> <Filename>aldorzum.doc</Filename> <DivisionCode>US</DivisionCode> <ContentType>Template</ContentType> <ProductCode>VIMZIM</ProductCode> </sequence>... (1 Reply)
Discussion started by: sravanreddy
1 Replies

3. Shell Programming and Scripting

Passing values to an XML file from shell script

:wall: Hi, I have an XML file with 5 tags. I need to pass values to the XML file from a shell script that will replace values in 2 of the tags. I cannot hardcode the tag values in XML and use replace command in script as the values are likely to change. Please help !!!!!!!!!!! (2 Replies)
Discussion started by: Monalisaa
2 Replies

4. Shell Programming and Scripting

Get multiple values from an xml file using one of the following commands or together awk/perl/script

Hello, I have a requirement to extract the value from multiple xml node and print out the values to new file to compare. Would be done using either awk/perl or some unix script. For example sample input file: ..... ..... <factories xmi:type="resources.jdbc:DataSource"... (2 Replies)
Discussion started by: slbmind
2 Replies

5. Shell Programming and Scripting

Extract values from an XML File

Hi, I need to capture all the attributes with delete next to it. The source XML file is attached. The output should contain something like this below: Attributes = legacyExchangeDN Action = Delete Username = Hero Joker Loginid = joker09 OU =... (4 Replies)
Discussion started by: prvnrk
4 Replies

6. Shell Programming and Scripting

Perl script to extract 'ID' From XML File

File1.xml <?xml version.........> - <abcd:abcd_list version="1" www.john_uncle's_server.com" xmlns: - <device id="100"> <firmware>12.4(3d)</firmware> <location id="500">Sitting Room</location> </device> - <device id="101"> <firmware>12.4(3d)</firmware> <location id="501">Class... (1 Reply)
Discussion started by: sureshcisco
1 Replies

7. Shell Programming and Scripting

KSH Script to Get the <TAG Values> from an XML file

Hi All, I am new to Unix I need a KSH script to get the values from XML file to write to a temp file. Like the requirement is from the below TAG <MAPPING DESCRIPTION ="Test Mapping" ISVALID ="YES" NAME ="m_test_xml" OBJECTVERSION ="1" VERSIONNUMBER ="1"> I need the MAPPING DESCRIPTION... (3 Replies)
Discussion started by: perlamohan
3 Replies

8. Shell Programming and Scripting

Extract XML Element Values

I have a rather large file with XML-style content. Each line contains one full XML entry. For example: 1:<Message><DNIS>1234</DNIS><UCID>3456</UCID><TransferGroup>XYZXYZ</TransferGroup></Message> 2:<Message><DNIS>9999</DNIS><UCID>2584</UCID><TransferGroup>ABCABC</TransferGroup></Message>... (1 Reply)
Discussion started by: sharpi03
1 Replies

9. Shell Programming and Scripting

Help with shell script to extract data from XML file

Hello Scripting Gurus, I need help with extracting data from the XML file using shell script. The data is in a large XML and I need to extract the id values of all completedworkflows. Here is a sample of it. Input and output data is also in the attached text files. <wfregistry>... (5 Replies)
Discussion started by: yajaykumar
5 Replies
Login or Register to Ask a Question