Extract TAG name and XPATH from XML file via shellscript


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract TAG name and XPATH from XML file via shellscript
# 1  
Old 08-21-2012
Java Extract TAG name and XPATH from XML file via shellscript

Hi,
Here is a sample xml file and expected output.
I need to extract the element/tag name (not value) and xpath (sample output.txt).
But the main problem is I put here one simple xml file where I can clearly see the number of elements, but in real time I have a xml file which have over 500 elements, so is there any option to find the elements automatically and retrive the xpath.

sample.XML:

Code:
 
<?xml version = '1.0'?>
<ROWSET>
<ROW num="1">
<EMPNO>7369</EMPNO>
<ENAME>SMITH</ENAME>
<JOB>CLERK</JOB>
<MGR>7902</MGR>
<HIREDATE>12/17/1980 0:0:0</HIREDATE>
<SAL>800</SAL>
<DEPTNO>20</DEPTNO>
</ROW>
<ROW num="2">
<EMPNO>7499</EMPNO>
<ENAME>ALLEN</ENAME>
<JOB>SALESMAN</JOB>
<MGR>7698</MGR>
<HIREDATE>2/20/1981 0:0:0</HIREDATE>
<SAL>1600</SAL>
<COMM>300</COMM>
<DEPTNO>30</DEPTNO>
</ROW>
</ROWSET>

Output.txt :
Code:
 
ROW_NUM /ROWSET/ROW/@num
EMPNO /ROWSET/ROW/EMPNO
ENAME /ROWSET/ROW/ENAME
JOB /ROWSET/ROW/JOB
MGR /ROWSET/ROW/MGR
HIREDATE /ROWSET/ROW/HIREDATE
SAL /ROWSET/ROW/SAL
COMM /ROWSET/ROW/COMM
DEPTNO /ROWSET/ROW/DEPTNO

Note: If there is a attribute like "num" and the value is changing like 1, 2 or more than that.. is there any changes in output file?


Thnx,
Bithun

Last edited by BithunC; 08-23-2012 at 07:20 AM.. Reason: code tag
# 2  
Old 08-21-2012
Code:
awk 'BEGIN{f=0}
(/^<ROW /){
		split($0,a,"\"");
		print "ROW_NUM/ROWSET/ROW/@"a[2];
		f=1;
		next
	}
(f==1){
	if(/^<\/ROW/)
		{
			f=0
		}
	else
		{
			split($0,a,"<|>");
			print a[2]"/ROWSET/ROW/"a[3]
		}
	}' sample.xml

output is
Code:
ROW_NUM/ROWSET/ROW/@1
EMPNO/ROWSET/ROW/7369
ENAME/ROWSET/ROW/SMITH
JOB/ROWSET/ROW/CLERK
MGR/ROWSET/ROW/7902
HIREDATE/ROWSET/ROW/12/17/1980 0:0:0
SAL/ROWSET/ROW/800
DEPTNO/ROWSET/ROW/20
ROW_NUM/ROWSET/ROW/@2
EMPNO/ROWSET/ROW/7499
ENAME/ROWSET/ROW/ALLEN
JOB/ROWSET/ROW/SALESMAN
MGR/ROWSET/ROW/7698
HIREDATE/ROWSET/ROW/2/20/1981 0:0:0
SAL/ROWSET/ROW/1600
COMM/ROWSET/ROW/300
DEPTNO/ROWSET/ROW/30

is this output you required?
This User Gave Thanks to raj_saini20 For This Post:
# 3  
Old 08-21-2012
Thanx for the suggestion raj.
Appritiate your help.

But I don't need the element value, you can see my expected output there is a " " (space) between element name and xpath (so that i can differentiate with element name and xpath) but not the element value, otherwise its fine.
thanx again Smilie
# 4  
Old 08-21-2012
Code:
awk 'BEGIN{f=0}
(/^<ROW /){
		split($0,a,"\"");
		print "ROW_NUM /ROWSET/ROW/@num";
		f=1;
		next
	}
(f==1){
	if(/^<\/ROW/)
		{
			f=0
		}
	else
		{
			split($0,a,"<|>");
			print a[2]" /ROWSET/ROW/"a[2]
		}
	}' sample.xml

output will be
Code:
ROW_NUM /ROWSET/ROW/@num
EMPNO /ROWSET/ROW/EMPNO
ENAME /ROWSET/ROW/ENAME
JOB /ROWSET/ROW/JOB
MGR /ROWSET/ROW/MGR
HIREDATE /ROWSET/ROW/HIREDATE
SAL /ROWSET/ROW/SAL
DEPTNO /ROWSET/ROW/DEPTNO
ROW_NUM /ROWSET/ROW/@num
EMPNO /ROWSET/ROW/EMPNO
ENAME /ROWSET/ROW/ENAME
JOB /ROWSET/ROW/JOB
MGR /ROWSET/ROW/MGR
HIREDATE /ROWSET/ROW/HIREDATE
SAL /ROWSET/ROW/SAL
COMM /ROWSET/ROW/COMM
DEPTNO /ROWSET/ROW/DEPTNO

This User Gave Thanks to raj_saini20 For This Post:
# 5  
Old 08-21-2012
Ya raj ..thanks a lot ... now the output looks same as per i requierd.
just suggest me one thing .. should i save that code sample.ksh with starting a header line #!/usr/bin/ksh ??
if so then in the time of execution should i write like this :

ksh sample.ksh sample.xml

---------- Post updated at 12:54 PM ---------- Previous update was at 12:47 PM ----------

yes .. its working as i wrote starting with header line #!/usr/bin/ksh

ksh sample.ksh > sample.txt

Thanx Raj ...
# 6  
Old 08-21-2012
No need to give file name at run time as it is mentioned in script it self
# 7  
Old 08-21-2012
Raj, you gave ROW/@num as hard-coded but if there more than attribute and some kind of parent-child element in this case how we can solve?

i am giving here one onother example :

sample2.xml
Code:
 
<FORMINFO FORMVERSION="12-2009" DOCID="CHS-101228-01340-2" FILENUM="776719" CASE_NO="10660990" FORMNUM="BPOCHASE" VERSION="3.6" VENDOR="UTLSValuations" MAINFORM="BPOCHASE">
<SUBJECT>
 <LOANNUM>100001416010918632</LOANNUM>
 <RELATIONSHIPNUM>UTLS Default Services</RELATIONSHIPNUM>
 <REONUM />
 <ADDR>
   <STREET>32416 N 44TH PL</STREET>
   <CITY>CAVE CREEK</CITY>
   <STATEPROV>AZ</STATEPROV>
   <ZIP>85331</ZIP>
 </ADDR>
 <COUNTY />
 <ASSESPARCEL>
   <NUM>211-34-168</NUM>
 </ASSESPARCEL>
 <BORROWER />
 <SOLDLISTED VALUE="YES" />
 <PRICEREDUCTION NUM="1">
   <DOM />
   <PRICE />
 </PRICEREDUCTION>
 <PRICEREDUCTION NUM="2">
   <DOM />
   <PRICE />
 </PRICEREDUCTION>
 <PRICEREDUCTION NUM="3">
   <DOM />
   <PRICE />
 </PRICEREDUCTION>
 <PROJ>
   <TYPE>SFRD</TYPE>
   <DESCRIPTION>SFRD</DESCRIPTION>
 </PROJ>
 <HOMEOWNERASSNFEE>300</HOMEOWNERASSNFEE>
 <HOMEOWNERASSNRESPONSE>YEARLY</HOMEOWNERASSNRESPONSE>
 <HOA>
   <FEECURRENT>YES</FEECURRENT>
   <DELINQUENCIES />
   <MAINTENANCE>INSURANCE</MAINTENANCE>
   <COMPANY />
   <PHONE>(623) 572-7579</PHONE>
   <LEGALISSUES />
 </HOA>
 <HOMEOWNERASSNDESC />
 <PROJECT>
   <NAME />
 </PROJECT>
 <CURRENTOCCUPANT>VACANT</CURRENTOCCUPANT>
 <CURRENTOWNER>MAINRESIDENCE</CURRENTOWNER>
</SUBJECT>
</FORMINFO>

output2.txt

Code:
 
FORMVERSION /FORMINFO/@FORMVERSION
DOCID /FORMINFO/@DOCID
FILENUM /FORMINFO/@FILENUM
CASE_NO /FORMINFO/@CASE_NO
FORMNUM /FORMINFO/@FORMNUM
VERSION /FORMINFO/@VERSION
VENDOR /FORMINFO/@VENDOR
MAINFORM /FORMINFO/@MAINFORM
LOANNUM /FORMINFO/SUBJECT/LOANNUM
RELATIONSHIPNUM /FORMINFO/SUBJECT/RELATIONSHIPNUM
REONUM /FORMINFO/SUBJECT/REONUM
CITY /FORMINFO/SUBJECT/ADDR/CITY
STATEPROV /FORMINFO/SUBJECT/ADDR/STATEPROV
STREET /FORMINFO/SUBJECT/ADDR/STREET
ZIP/FORMINFO/SUBJECT/ADDR/ZIP
....
....

I faced problem on more than one child tag of a single parent and more than one attribute. Smilie

Thanx in advance..

Last edited by BithunC; 08-21-2012 at 06:16 AM.. Reason: code tag
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

Extract XML tag value from file

Hello, Hope you are doing fine. I have an log file which looks like as follows: Some junk text1 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: Some jun text 2 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: ... (3 Replies)
Discussion started by: srattani
3 Replies

4. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

5. Shell Programming and Scripting

Extract multiple xml tag value into CSV format

Hi All, Need your assistance on another xml tag related issue. I have a xml file as below: <INVOICES> <INVOICE> <BILL> <BILL_NO>1234</BILL_NO> <BILL_DATE>01 JAN 2011</BILL_DATE> </BILL> <NAMEINFO> <NAME>ABC</NAME> </NAMEINFO> </INVOICE> <INVOICE> <BILL> <BILL_NO>5678</BILL_NO>... (12 Replies)
Discussion started by: angshuman
12 Replies

6. Shell Programming and Scripting

XML tag replacement from different XML file

We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Discussion started by: balrajg
0 Replies

7. Shell Programming and Scripting

extract xml tag based on condition

Hi All, I have a large xml file of invoices. The file looks like below: <INVOICES> <INVOICE> <NAME>Customer A</NAME> <INVOICE_NO>1234</INVOICE_NO> </INVOICE> <INVOICE> <NAME>Customer A</NAME> <INVOICE_NO>2345</INVOICE_NO> </INVOICE> <INVOICE> <NAME>Customer A</NAME>... (9 Replies)
Discussion started by: angshuman
9 Replies

8. Shell Programming and Scripting

Bash XML Parsing using Perl XPath

I have a bash script that needs to read input from an XML file, which includes varying numbers of a certain type of child node. I want to be able to iterate through all the child nodes of a given parent. I installed the Perl XML-XPath package from search.cpan.org. Once it's installed, from bash,... (4 Replies)
Discussion started by: jfmorales
4 Replies

9. Shell Programming and Scripting

how to extract the info in the tag from a xml file

Hi All, Do anyone of you have any idea how to extract each<info> tag to each different file. I have 1000 raw files, which come in every 15 mins.( I am using bash) I have tried my script as below, but it took hours to finish, which is inefficiency. perl -n -e '/^<info>/ and open FH,">file".$n++;... (2 Replies)
Discussion started by: natalie23
2 Replies

10. UNIX for Dummies Questions & Answers

Unable to extract a tag from a very long XML message

Hi I have a log file which contain XML message. I want to extract the value between the tag : <businessEventId>13201330</businessEventId> i.e., 13201330. I tried the following commands but as the message is very long, unable to do it. Attached is the log file. Please provide inputs. --... (3 Replies)
Discussion started by: Sapna_Sai
3 Replies
Login or Register to Ask a Question