How to get distinct Tags from an XML file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to get distinct Tags from an XML file?
# 8  
Old 09-17-2014
Quote:
Originally Posted by RavinderSingh13
Hello Ariean,

Following may help.

Code:
awk '{
match($0,/<\/.*>/); 
b=substr($0,RSTART,RLENGTH); 
 if(b)
    {a[++i]=b}
     } 
END{
  {for(k in a)
    {c[a[k]]=k}
 } 
 {for(u in c)
  {gsub(/\//,X,u);print u}
 }
   }' Input_File

Output will be as follows.

Code:
<BORROWER_NAME>
<LOSS_GIVEN_DEFAULT>
<NON_CURR_LIABILITIES>
<Provider>
<REPAYMENT_SOURCE>
<YOUNG_FARMER_FLAG>
<Institution>
<UMBRELLA_NUMBER>
<FIPS_CODE>
<Customer>
<STATUS_FLAG>
<Loan>
<COLL_TYPE>

NOTE: This code has been tested on the sample code.


Thanks,
R. Singh
could you please explain a little bit what you are doing in your code i am naive to awk. Many Thanks
# 9  
Old 09-17-2014
Hello Ariean,

Following may help.

Code:
awk '{
match($0,/<\/.*>/);                    ##### Making match for string which starts with </ and ends with > ######
b=substr($0,RSTART,RLENGTH);           ##### Storing the matched string value in a variable named b #####
 if(b)                           ##### If variable b is NOT null #####
    {a[++i]=b}                         ##### creating array named a whose index is a increasing valued variable #####
     } 
END{
  {for(k in a)                       ##### Fetching the values of array a #####
    {c[a[k]]=k}                       ##### storinng values in a array named c, whose index is the value of array a and it's value is the index of array a #####
 } 
 {for(u in c)                       ##### Fetching the values of array c #####
  {gsub(/\//,X,u);print u}             ##### Removing the / from the values #####
 }
   }' Input_File

Thanks,
R. Singh
# 10  
Old 09-17-2014
Quote:
Originally Posted by RavinderSingh13
Hello Ariean,

Following may help.

Code:
awk '{
match($0,/<\/.*>/);            ##### Making match for string which starts with < and ends with > ######
b=substr($0,RSTART,RLENGTH);   ##### Storing the matched string value in a variable named b #####
 if(b)                               ##### If variable b is NOT null #####
    {a[++i]=b}                 ##### creating array named a whose index is a increasing valued variable #####
     } 
END{
  {for(k in a)                 ##### Fetching the values of array a #####
    {c[a[k]]=k}               ##### storing values in a array named c, whose index is the value of array a and it's value is the index of array a #####
 } 
 {for(u in c)               ##### Fetching the values of array c #####
  {gsub(/\//,X,u);print u}     ##### Removing the / from the values #####
 }
   }' Input_File

Thanks,
R. Singh
Thank you i just put it for a test against 5.8 GB XML file, it running for past 1 hour, is there any way we can fine tune this, appreciate your help.
# 11  
Old 09-17-2014
If you have more than 1 tag per line something like this may be more accurate:

Code:
awk -F '[> ]' '! /^[/?]/ && length($1) && !h[$1]++ {print RS $1 ">" }' RS=\< infile


Last edited by Chubler_XL; 09-17-2014 at 04:40 PM.. Reason: Track and removed duplicate tags
This User Gave Thanks to Chubler_XL For This Post:
# 12  
Old 09-18-2014
Quote:
Originally Posted by Chubler_XL
If you have more than 1 tag per line something like this may be more accurate:

Code:
awk -F '[> ]' '! /^[/?]/ && length($1) && !h[$1]++ {print RS $1 ">" }' RS=\< infile

Thanks it worked pretty fast, but in my below excerpt of output file how do i remove the tags highlighted below. Looks like first tag is because of some junk characters from input file as i see it.
Code:
<>
<ACCEPTABLE_VOL>
<ACCRUED_INTEREST>
<APPRAISAL_DATE_RE>
<APPRAISED_VALUE_RE>
<FACILITY_DESC>
<FACILITY_GROSS_OUTSTANDING>
<FARM_OPS_EXP>
<FARM_PAYMENT_SUPPORT>
<!--FILE>
<FIPS_CODE>
<FUNDS_HELD_BAL>
.
..
...

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies

2. Shell Programming and Scripting

Extract the specific tags in a XML file

Hello Shell Gurus, I have a requirement to get the specific tags from a XML file. Here is my code snippet <jdbc-system-resource> <name>SDPData Source</name> <target>AdminServer,osb_server1,soa_server1</target> ... (30 Replies)
Discussion started by: Siv51427882
30 Replies

3. Shell Programming and Scripting

Sort tags in an xml file

Hi All, Below is an extract from xml. Here the pattern of the tags is not uniform. i.e., For user A --> name,id,add isthe series For user B --> id,name,add is the series <name>A<\name> <id>A1<\id> <add>A2<\add> <id>B1<\id> <name>B<\name> <add>B2<\add> <add>C2<\add> <id>C1<\id>... (2 Replies)
Discussion started by: Girish19
2 Replies

4. Shell Programming and Scripting

Split XML file based on tags

Hello All , Please help me with below requirement I want to split a xml file based on tag.here is the file format <data-set> some-information </data-set> <data-set1> some-information </data-set1> <data-set2> some-information </data-set2> I want to split the above file into 3... (5 Replies)
Discussion started by: Pratik4891
5 Replies

5. Shell Programming and Scripting

How to add Xml tags to an existing xml using shell or awk?

Hi , I have a below xml: <ns:Body> <ns:result> <Date Month="June" Day="Monday:/> </ns:result> </ns:Body> i have a lookup abc.txtt text file with below details Month June July August Day Monday Tuesday Wednesday I need a output xml with below tags <ns:Body> <ns:result>... (2 Replies)
Discussion started by: Nevergivup
2 Replies

6. Shell Programming and Scripting

Perl : to split the tags from xml file

I do have an xml sheet as below where I need the perl script to filter only the hyperlink tags. <cols><col min="1" max="1" width="30.5703125" customWidth="1"/><col min="2" max="2" width="7.140625" bestFit="1" customWidth="1"/> <col min="3" max="3" width="32.28515625" bestFit="1"... (3 Replies)
Discussion started by: scriptscript
3 Replies

7. Shell Programming and Scripting

Shell Command to compare two xml lines while ignoring xml tags

I've got two different files and want to compare them. File 1 : HTML Code: <response ticketId="944" type="getQueryResults"><status>COMPLETE</status><description>Query results fetched successfully</description><recordSet totalCount="1" type="sms_records"><record... (1 Reply)
Discussion started by: Shaishav Shah
1 Replies

8. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

9. Shell Programming and Scripting

Removing unwanted tags from xml file

I have a XML file given as below: "<ProductUOMAlternativeDetails> <removetag> <UOMCode>EA</UOMCode> <numeratorForConversionToBaseUOM>1</numeratorForConversionToBaseUOM> <denominatorForConversionToBaseUOM>1</denominatorForConversionToBaseUOM> <length>0.59</length> <width>0.96</width> ... (3 Replies)
Discussion started by: vikingh
3 Replies

10. UNIX for Dummies Questions & Answers

Search for xml tags in a file

Hi, I need to search for a pattern like : <A:UserAttr Name="ACTIVITY_ID"> <A:Value>1111120</A:Value> </A:UserAttr> Let us the there is a dir /tmp that contains 5 xml file. each of them multiple above tags in the file. If found all the three line would be... (2 Replies)
Discussion started by: tictactoe
2 Replies
Login or Register to Ask a Question