To search for a particular tag in xml and collate all similar tag values and display them count


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To search for a particular tag in xml and collate all similar tag values and display them count
# 1  
Old 07-27-2014
To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help

Code:
 
File:
<xml><object1>house</object1><object2>child</object2>
<xml><object1>book</object1><object2>tree</object2>
<xml><object1>house</object1><object2>roof</object2>
 
o/p:
 
House: (Count - 2)
<xml><object1>house</object1><object2>child</object2>
<xml><object1>house</object1><object2>roof</object2>
 
Book: (Count - 1)
<xml><object1>book</object1><object2>tree</object2>

# 2  
Old 07-27-2014
Quote:
Originally Posted by srkmish
I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls.
Your question makes a few assertions about your input i would like to verify before i start to suggest anything:

You imply that the values are not spanning several lines, which would be legal in XML:

Code:
<xml><object1>foo bar</object1>
<object1>foo
bar</object1></xml>

Basically the two lines would be equivalent in XML, but maybe (?) not in your requirement.

Furthermore, what about blanks and other whitespace? is foo bar equivalent to foo bar?

I hope this helps.

bakunin
# 3  
Old 07-27-2014
Try something like this

Code:
akshay@nio:/tmp$ cat file
<xml><object1>house</object1><object2>child</object2>
<xml><object1>book</object1><object2>tree</object2>
<xml><object1>house</object1><object2>roof</object2>

Code:
akshay@nio:/tmp$ awk -F'[><]' 'NF{ c[$5]++; d[$5] = d[$5] ? d[$5] ORS $0 : $0}END{for(i in d) print i ": (count - " c[i] ")" RS d[i] RS  }' file

book: (count - 1)
<xml><object1>book</object1><object2>tree</object2>

house: (count - 2)
<xml><object1>house</object1><object2>child</object2>
<xml><object1>house</object1><object2>roof</object2>

# 4  
Old 07-28-2014
Quote:
Originally Posted by bakunin
Your question makes a few assertions about your input i would like to verify before i start to suggest anything:

You imply that the values are not spanning several lines, which would be legal in XML:

Code:
<xml><object1>foo bar</object1>
<object1>foo
bar</object1></xml>

Basically the two lines would be equivalent in XML, but maybe (?) not in your requirement.

Furthermore, what about blanks and other whitespace? is foo bar equivalent to foo bar?

I hope this helps.

bakunin
Basically, the file will be a collection of huge no of xmls each in a different line and the tag will not span multiple lines. I actually want a generic method to do this i.e. the command should scan for "object1" tag and should extract the value between <object1> and </object1> and display all the xmls containing this particular value and its count.

---------- Post updated at 02:25 AM ---------- Previous update was at 02:21 AM ----------

Quote:
Originally Posted by Akshay Hegde
Try something like this

Code:
akshay@nio:/tmp$ cat file
<xml><object1>house</object1><object2>child</object2>
<xml><object1>book</object1><object2>tree</object2>
<xml><object1>house</object1><object2>roof</object2>

Code:
akshay@nio:/tmp$ awk -F'[><]' 'NF{ c[$5]++; d[$5] = d[$5] ? d[$5] ORS $0 : $0}END{for(i in d) print i ": (count - " c[i] ")" RS d[i] RS  }' file
 
book: (count - 1)
<xml><object1>book</object1><object2>tree</object2>
 
house: (count - 2)
<xml><object1>house</object1><object2>child</object2>
<xml><object1>house</object1><object2>roof</object2>

Hey, this works perfectly. Thanks. However, can you suggest a generic method to do this . I wanna search for the "object1" tag in the xml and copy its tag value and display all lines containing this value and its count. Can you explain ur command a bit so i can understand the code. I want to extrapolate this command later so that i can search for other tag values and display content accordingly.
# 5  
Old 07-29-2014
Hey guys, i would be really grateful if anyone can explain the code that akshay wrote.

Code:
awk -F'[><]' 'NF{ c[$5]++; d[$5] = d[$5] ? d[$5] ORS $0 : $0}END{for(i in d) print i ": (count - " c[i] ")" RS d[i] RS  }' file

# 6  
Old 07-29-2014
OK I'll give it a whirl:

Firstly I'll break it into multiple lines for ease of reading:

Code:
awk -F'[><]' '
NF {
  c[$5]++
  d[$5] = d[$5] ? d[$5] ORS $0 : $0
}
END{
   for(i in d)
      print i ": (count - " c[i] ")" RS d[i] RS
}' file

NF { examine lines that have 1 or more fields (ie non-blank lines).

-F'[><]' This argument to awk sets your field separator to < or > awk will split the line up on these characters and assign each field to $1 thru $n.

So for <xml><object1>house</object1><object2>child</object2>

we get:
Code:
$1 = ""
$2 = "xml"
$3 = ""
$4 = "object1"
$5 = "house"
$6 = "/object1"

c[$5]++ creates a associative array c[] with field #5 as the key and increments the value (c[house]=c[house]+1) so it's a count of the number of times each tag appears.

d[$5] = d[$5] ? d[$5] ORS $0 : $0 if d[$5] is not null/blank then append ORS (output record separator which is newline in this case) and while input line to it; otherwise assign it to the whole input line.

The END block goes through all the keys in the d[] array and prints the key count followed by all input lines that contain that key (value of the d[] array element).
# 7  
Old 07-30-2014
Woah chubler, this is fantastic. Thanks. That cleared up things a lot for me.

But what about when the <object>1 value tag is not necessarily in $5 position. How to search for the tag value then and add it to an array.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

2. Shell Programming and Scripting

XML files with spaces in the tag name, parse & display?

Greetings all, I have an XML file that is being generated from my application, here is a sample of the first tag (That I am trying to remove and display in a list..) Example- <tag one= "data" data="1234" updateTime="1300"> <tag one= "data1" data="1234" updateTime="1300"> <tag... (5 Replies)
Discussion started by: jeffs42885
5 Replies

3. Shell Programming and Scripting

sed search and replace after xml tag

Hi All, I'm new to sed. In following XML file <interface type='direct'> <mac address='52:54:00:86:ce:f6'/> <source dev='eno1' mode='bridge'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> ... (8 Replies)
Discussion started by: varunrapelly
8 Replies

4. Shell Programming and Scripting

Search for a tag and display a message if not found.

Hi All, I am working with a XML file. Below is part for the file. <Emp:Profile> <Emp:Description>Admin</Emp:Description> <Emp:Id>12347</Emp:Id> </Emp:Profile> <Emp:Profile> ... (7 Replies)
Discussion started by: Girish19
7 Replies

5. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

6. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

7. Shell Programming and Scripting

Find out values between xml tag

Find out values between xml tag ....... ABC><name></ABC><xyz>test</xyz>..here some other tag... <ABC><NUMBER></ABC><xyz>12345</xyz>.... ....... I want to take between bewtween ABC><NUMBER></ABC><xyz> to </xyz> that is 12345 (3 Replies)
Discussion started by: Jairaj
3 Replies

8. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

9. Shell Programming and Scripting

KSH Script to Get the <TAG Values> from an XML file

Hi All, I am new to Unix I need a KSH script to get the values from XML file to write to a temp file. Like the requirement is from the below TAG <MAPPING DESCRIPTION ="Test Mapping" ISVALID ="YES" NAME ="m_test_xml" OBJECTVERSION ="1" VERSIONNUMBER ="1"> I need the MAPPING DESCRIPTION... (3 Replies)
Discussion started by: perlamohan
3 Replies

10. Shell Programming and Scripting

Extracting tag values from XML using perl

Hi All, I'm trying to extract the values for the 'src' and 'alt' tags within an xml file. In the files that I'm searching, the tags are always enclosed within an 'img' tag. Typically: <img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark... (3 Replies)
Discussion started by: Steve_altius
3 Replies
Login or Register to Ask a Question