Extract TAG name and XPATH from XML file via shellscript


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract TAG name and XPATH from XML file via shellscript
# 8  
Old 08-22-2012
try this
Code:
awk 'BEGIN{f=0}
(/^<FORMINFO /){
		var="FORMINFO/";
		for(i=2;i<=NF;i++)
			{
				split($i,a,"=");
				print a[1]" /"var"@"a[1]
			}
		f=1;
		next
	}
(f==1){ n=split($0,a,"<|>");
	x=n-1;
	if(a[2] ~ /^\//)
		{
			
			if(a[2] ~ /FORMINFO/) f=0;
			sub(a[2],"",var);next
		};
	if(($0 !~ /\//))
		{	split(a[2],b,FS);
			var=substr(var,1)b[1]"/";next
		};
	split(a[2],b,FS);
	print b[1]" /" var b[1]
	
	}' sample.xml

Output is
Code:
FORMVERSION /FORMINFO/@FORMVERSION
DOCID /FORMINFO/@DOCID
FILENUM /FORMINFO/@FILENUM
CASE_NO /FORMINFO/@CASE_NO
FORMNUM /FORMINFO/@FORMNUM
VERSION /FORMINFO/@VERSION
VENDOR /FORMINFO/@VENDOR
MAINFORM /FORMINFO/@MAINFORM
LOANNUM /FORMINFO/SUBJECT/LOANNUM
RELATIONSHIPNUM /FORMINFO/SUBJECT/RELATIONSHIPNUM
REONUM /FORMINFO/SUBJECT/REONUM
STREET /FORMINFO/SUBJECT/ADDR/STREET
CITY /FORMINFO/SUBJECT/ADDR/CITY
STATEPROV /FORMINFO/SUBJECT/ADDR/STATEPROV
ZIP /FORMINFO/SUBJECT/ADDR/ZIP
COUNTY /FORMINFO/SUBJECT/COUNTY
NUM /FORMINFO/SUBJECT/ASSESPARCEL/NUM
BORROWER /FORMINFO/SUBJECT/BORROWER
SOLDLISTED /FORMINFO/SUBJECT/SOLDLISTED
DOM /FORMINFO/SUBJECT/PRICEREDUCTION/DOM
PRICE /FORMINFO/SUBJECT/PRICEREDUCTION/PRICE
DOM /FORMINFO/SUBJECT/PRICEREDUCTION/DOM
PRICE /FORMINFO/SUBJECT/PRICEREDUCTION/PRICE
DOM /FORMINFO/SUBJECT/PRICEREDUCTION/DOM
PRICE /FORMINFO/SUBJECT/PRICEREDUCTION/PRICE
TYPE /FORMINFO/SUBJECT/PROJ/TYPE
DESCRIPTION /FORMINFO/SUBJECT/PROJ/DESCRIPTION
HOMEOWNERASSNFEE /FORMINFO/SUBJECT/HOMEOWNERASSNFEE
HOMEOWNERASSNRESPONSE /FORMINFO/SUBJECT/HOMEOWNERASSNRESPONSE
FEECURRENT /FORMINFO/SUBJECT/HOA/FEECURRENT
DELINQUENCIES /FORMINFO/SUBJECT/HOA/DELINQUENCIES
MAINTENANCE /FORMINFO/SUBJECT/HOA/MAINTENANCE
COMPANY /FORMINFO/SUBJECT/HOA/COMPANY
PHONE /FORMINFO/SUBJECT/HOA/PHONE
LEGALISSUES /FORMINFO/SUBJECT/HOA/LEGALISSUES
HOMEOWNERASSNDESC /FORMINFO/SUBJECT/HOMEOWNERASSNDESC
NAME /FORMINFO/SUBJECT/PROJECT/NAME
CURRENTOCCUPANT /FORMINFO/SUBJECT/CURRENTOCCUPANT
CURRENTOWNER /FORMINFO/SUBJECT/CURRENTOWNER

Is this output you required
This User Gave Thanks to raj_saini20 For This Post:
# 9  
Old 08-22-2012
yes raj .. thanx for ur extreme help ....
this is exactly what i need ...
this one is perfect for one xml file ..but in our scenario we have a share folder where almost 10000 xml will be stored every month.
there i can create one param file contains all xml file names with extension.
now my question is ... is it possible to pass that param file into ur above code which will give output as one .txt file contains all element name along with xpath as ur output was.
there might be 200 common elements in all xml files and rest of all are might be new....so we have have to make one superset of all elements and have to put down in one test file along with the xpath.

note : actually we are building sas mapping script to load all xml to oracle db, previously we did manual copy paset work for all xml and it took 10 days to cover almost all elements Smilie
kindly suggest, do we can load all xml element data to oracle db using shell script?

Last edited by BithunC; 08-23-2012 at 04:56 AM..
# 10  
Old 08-23-2012
yes with script anything can be done

But provide example having your scenario
# 11  
Old 08-23-2012
Quote:
Originally Posted by raj_saini20
yes with script anything can be done

But provide example having your scenario

Raj,

Here i attached 5 xml files for your reference. but in our share-path more than 10000 this type of xml file will be comming in every month.
we have to store all data into a oracle table (this can be more than one table if required).
at very begining we dont have any fixed oracle table, we have created three table as per elements we found in xml file.
Through SAS we successfully load all xml files into different oracle tables, but during that SAS script writing we have to write all elements name along with xpath manually and that was a truely boring work for all files.
by shell script we made one param file containing all names of xml files and passed that param file to our SAS code. SAS read all xml one by one and insert all data into oracle tables.
Now in coming week probably we will not be able to access SAS and we have to pull all xml data from that xml files in share-path into oracle table by one shell script.

now you suggest me is this possible or not.... check the xml files(attached) and give me some idea.

Last edited by BithunC; 08-23-2012 at 07:25 AM..
# 12  
Old 08-24-2012
anyone .... any helping hand... on post#11
https://www.unix.com/302690509-post11.html
# 13  
Old 08-24-2012
Raj,

atleast help me on your post#8
using your code i can get required output. Smilie
i have attached 5 xml files on post#11
i can use param file like fileNames.txt(attached here) and passing this file into your code how i can generate one output.txt which contains all elements and xpath(like your output) of all 5 files together.
Advance thanx for your kindful help. Smilie
--Bithun

Last edited by BithunC; 08-24-2012 at 03:43 AM.. Reason: hyperlink tag
# 14  
Old 08-24-2012
filenames.txt contains those 5 xml file names.

what exactly you want to do

if you want to select only these file names for which output is required then try

Code:
cat filenames.txt | xargs -n 1 my_awk_cmd

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

Extract XML tag value from file

Hello, Hope you are doing fine. I have an log file which looks like as follows: Some junk text1 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: Some jun text 2 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: ... (3 Replies)
Discussion started by: srattani
3 Replies

4. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

5. Shell Programming and Scripting

Extract multiple xml tag value into CSV format

Hi All, Need your assistance on another xml tag related issue. I have a xml file as below: <INVOICES> <INVOICE> <BILL> <BILL_NO>1234</BILL_NO> <BILL_DATE>01 JAN 2011</BILL_DATE> </BILL> <NAMEINFO> <NAME>ABC</NAME> </NAMEINFO> </INVOICE> <INVOICE> <BILL> <BILL_NO>5678</BILL_NO>... (12 Replies)
Discussion started by: angshuman
12 Replies

6. Shell Programming and Scripting

XML tag replacement from different XML file

We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Discussion started by: balrajg
0 Replies

7. Shell Programming and Scripting

extract xml tag based on condition

Hi All, I have a large xml file of invoices. The file looks like below: <INVOICES> <INVOICE> <NAME>Customer A</NAME> <INVOICE_NO>1234</INVOICE_NO> </INVOICE> <INVOICE> <NAME>Customer A</NAME> <INVOICE_NO>2345</INVOICE_NO> </INVOICE> <INVOICE> <NAME>Customer A</NAME>... (9 Replies)
Discussion started by: angshuman
9 Replies

8. Shell Programming and Scripting

Bash XML Parsing using Perl XPath

I have a bash script that needs to read input from an XML file, which includes varying numbers of a certain type of child node. I want to be able to iterate through all the child nodes of a given parent. I installed the Perl XML-XPath package from search.cpan.org. Once it's installed, from bash,... (4 Replies)
Discussion started by: jfmorales
4 Replies

9. Shell Programming and Scripting

how to extract the info in the tag from a xml file

Hi All, Do anyone of you have any idea how to extract each<info> tag to each different file. I have 1000 raw files, which come in every 15 mins.( I am using bash) I have tried my script as below, but it took hours to finish, which is inefficiency. perl -n -e '/^<info>/ and open FH,">file".$n++;... (2 Replies)
Discussion started by: natalie23
2 Replies

10. UNIX for Dummies Questions & Answers

Unable to extract a tag from a very long XML message

Hi I have a log file which contain XML message. I want to extract the value between the tag : <businessEventId>13201330</businessEventId> i.e., 13201330. I tried the following commands but as the message is very long, unable to do it. Attached is the log file. Please provide inputs. --... (3 Replies)
Discussion started by: Sapna_Sai
3 Replies
Login or Register to Ask a Question