How to get distinct Tags from an XML file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to get distinct Tags from an XML file?
# 1  
Old 09-16-2014
How to get distinct Tags from an XML file?

Sample XML file:
Code:
<?xml version="1.0" encoding="UTF-16" ?>
<Provider PROVIDER="xx" SCHEMA_VERSION="2.5">
<Institution UNINUM="xxxx" EXTRACT_DATE="2013-12-31" CUSTOMER_ROW_COUNT="1577" LOAN_ROW_COUNT="3322" BOOK_VALUE_DOLLARS="720163381.46">
	<Customer CIF="dww213">
		<BORROWER_NAME>xxxxxxxxxxxx</BORROWER_NAME>
		<REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
			<Loan LOAN_NUMBER="xxx">
				<YOUNG_FARMER_FLAG>0</YOUNG_FARMER_FLAG>
				<LOSS_GIVEN_DEFAULT>A</LOSS_GIVEN_DEFAULT>
				<COLL_TYPE>3</COLL_TYPE>
				<STATUS_FLAG>1</STATUS_FLAG>
				<UMBRELLA_NUMBER>0025101101</UMBRELLA_NUMBER>
			</Loan>
	</Customer>
	<Customer CIF="z122321">
		<BORROWER_NAME>xxxxxxxxxxxx</BORROWER_NAME>
		<FIPS_CODE>xxxx</FIPS_CODE>
		<NON_CURR_LIABILITIES>2491022.00</NON_CURR_LIABILITIES>
		<REPAYMENT_SOURCE>1</REPAYMENT_SOURCE>
			<Loan LOAN_NUMBER="xxx">
				<YOUNG_FARMER_FLAG>0</YOUNG_FARMER_FLAG>
				<LOSS_GIVEN_DEFAULT>A</LOSS_GIVEN_DEFAULT>
				<COLL_TYPE>3</COLL_TYPE>
				<STATUS_FLAG>1</STATUS_FLAG>
				<UMBRELLA_NUMBER>0025101101</UMBRELLA_NUMBER>
			</Loan>
			<Loan LOAN_NUMBER="123">
				<YOUNG_FARMER_FLAG>0</YOUNG_FARMER_FLAG>
				<UMBRELLA_NUMBER>0025101101</UMBRELLA_NUMBER>
			</Loan>
	</Customer>
</Institution>
</Provider>


Hello All,
I am using red Linux OS and my requirement is to get only unique tags, for example for the above XML file i should get the below unique list of tags.

Code:
	<Provider>
	<Institution>
	<Customer>
	<BORROWER_NAME>
	<REPAYMENT_SOURCE>
	<FIPS_CODE>
	<NON_CURR_LIABILITIES>
	<Loan>
	<YOUNG_FARMER_FLAG>
	<LOSS_GIVEN_DEFAULT>
	<COLL_TYPE>
	<STATUS_FLAG>
	<UMBRELLA_NUMBER>

After i get this list i need to compare it against predefined list of tags and error/email out if the tag is not in that list.
I can do a for loop and compare against the predefined list but i am struck at how to get those unique tags from XML file, can you please help.

Thank you
# 2  
Old 09-16-2014
Try something like this:-
Code:
awk -F'[<> ]' '
        {
                sub(/\//,x,$2)
                if ( $2 !~ /xml/ )
                        A[$2]
        }
        END {
                for ( k in A )
                        print "<" k ">"
        }
' file.xml

This User Gave Thanks to Yoda For This Post:
# 3  
Old 09-16-2014
Try

Code:
$ awk -F'[<> ]' '{ $1 = $1 }$2 !~ /^[[:punct:]]/ && !a[$2]++{print "<"$2">"}' file.xml

---------- Post updated at 10:30 PM ---------- Previous update was at 10:24 PM ----------

OR
Code:
$ awk -F'[<> ]' '{ $1 = $1 }$2 !~ /^[[:punct:]]/ && !($2 in a){print "<"$2">"; a[$2]}' file.xml

This User Gave Thanks to Akshay Hegde For This Post:
# 4  
Old 09-16-2014
Quote:
Originally Posted by Akshay Hegde
Try

Code:
$ awk -F'[<> ]' '{ $1 = $1 }$2 !~ /^[[:punct:]]/ && !a[$2]++{print "<"$2">"}' file.xml

---------- Post updated at 10:30 PM ---------- Previous update was at 10:24 PM ----------

OR
Code:
$ awk -F'[<> ]' '{ $1 = $1 }$2 !~ /^[[:punct:]]/ && !($2 in a){print "<"$2">"; a[$2]}' file.xml

For some reason it is working fine for the sample file i provided which is idented properly but problem is if it is not idented properly it only prints below tags.
Code:
<Provider>
<>

# 5  
Old 09-16-2014
If you have xmllint, use it to show the structure:-
Code:
echo "du" | xmllint --shell file.xml

Pipe the output to an awk program to format.
# 6  
Old 09-16-2014
Quote:
Originally Posted by Ariean
For some reason it is working fine for the sample file i provided which is idented properly but problem is if it is not idented properly it only prints below tags.
Code:
<Provider>
<>

Then attach a sample of the actual xml file you want to process...
# 7  
Old 09-17-2014
Hello Ariean,

Following may help.

Code:
awk '{
match($0,/<\/.*>/); 
b=substr($0,RSTART,RLENGTH); 
 if(b)
    {a[++i]=b}
     } 
END{
  {for(k in a)
    {c[a[k]]=k}
 } 
 {for(u in c)
  {gsub(/\//,X,u);print u}
 }
   }' Input_File

Output will be as follows.

Code:
<BORROWER_NAME>
<LOSS_GIVEN_DEFAULT>
<NON_CURR_LIABILITIES>
<Provider>
<REPAYMENT_SOURCE>
<YOUNG_FARMER_FLAG>
<Institution>
<UMBRELLA_NUMBER>
<FIPS_CODE>
<Customer>
<STATUS_FLAG>
<Loan>
<COLL_TYPE>

NOTE: This code has been tested on the sample code.


Thanks,
R. Singh

Last edited by RavinderSingh13; 09-17-2014 at 03:09 AM.. Reason: Alignment
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies

2. Shell Programming and Scripting

Extract the specific tags in a XML file

Hello Shell Gurus, I have a requirement to get the specific tags from a XML file. Here is my code snippet <jdbc-system-resource> <name>SDPData Source</name> <target>AdminServer,osb_server1,soa_server1</target> ... (30 Replies)
Discussion started by: Siv51427882
30 Replies

3. Shell Programming and Scripting

Sort tags in an xml file

Hi All, Below is an extract from xml. Here the pattern of the tags is not uniform. i.e., For user A --> name,id,add isthe series For user B --> id,name,add is the series <name>A<\name> <id>A1<\id> <add>A2<\add> <id>B1<\id> <name>B<\name> <add>B2<\add> <add>C2<\add> <id>C1<\id>... (2 Replies)
Discussion started by: Girish19
2 Replies

4. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

5. Shell Programming and Scripting

How to add Xml tags to an existing xml using shell or awk?

Hi , I have a below xml: <ns:Body> <ns:result> <Date Month="June" Day="Monday:/> </ns:result> </ns:Body> i have a lookup abc.txtt text file with below details Month June July August Day Monday Tuesday Wednesday I need a output xml with below tags <ns:Body> <ns:result>... (2 Replies)
Discussion started by: Nevergivup
2 Replies

6. Shell Programming and Scripting

Perl : to split the tags from xml file

I do have an xml sheet as below where I need the perl script to filter only the hyperlink tags. <cols><col min="1" max="1" width="30.5703125" customWidth="1"/><col min="2" max="2" width="7.140625" bestFit="1" customWidth="1"/> <col min="3" max="3" width="32.28515625" bestFit="1"... (3 Replies)
Discussion started by: scriptscript
3 Replies

7. Shell Programming and Scripting

Shell Command to compare two xml lines while ignoring xml tags

I've got two different files and want to compare them. File 1 : HTML Code: <response ticketId="944" type="getQueryResults"><status>COMPLETE</status><description>Query results fetched successfully</description><recordSet totalCount="1" type="sms_records"><record... (1 Reply)
Discussion started by: Shaishav Shah
1 Replies

8. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

9. Shell Programming and Scripting

Removing unwanted tags from xml file

I have a XML file given as below: "<ProductUOMAlternativeDetails> <removetag> <UOMCode>EA</UOMCode> <numeratorForConversionToBaseUOM>1</numeratorForConversionToBaseUOM> <denominatorForConversionToBaseUOM>1</denominatorForConversionToBaseUOM> <length>0.59</length> <width>0.96</width> ... (3 Replies)
Discussion started by: vikingh
3 Replies

10. UNIX for Dummies Questions & Answers

Search for xml tags in a file

Hi, I need to search for a pattern like : <A:UserAttr Name="ACTIVITY_ID"> <A:Value>1111120</A:Value> </A:UserAttr> Let us the there is a dir /tmp that contains 5 xml file. each of them multiple above tags in the file. If found all the three line would be... (2 Replies)
Discussion started by: tictactoe
2 Replies
Login or Register to Ask a Question