Sponsored Content
Top Forums Shell Programming and Scripting Need an efficient way to search for a tag in an xml file having millions of rows Post 302603477 by balajesuri on Thursday 1st of March 2012 05:47:39 AM
Old 03-01-2012
File 'input' contains 1 million entries of this block:
Code:
<Root>
    <Person>
        <Name>John</Name>
    </Person>
</Root>

And here's an analysis:

Code:
[root@host dir]# time awk '/<Name>/' input | wc -l
1000000

real    0m7.802s
user    0m7.766s
sys     0m0.125s
[root@host dir]# time awk '/<Name>/ {i++} END {print i}' input
1000000

real    0m7.559s
user    0m7.485s
sys     0m0.074s
[root@host dir]# time grep -c "Name" input
1000000

real    0m0.158s
user    0m0.121s
sys     0m0.037s
[root@host dir]# time perl -ne '(/<Name>/) && $i++; END {print $i}' input
1000000
real    0m2.968s
user    0m2.928s
sys     0m0.040s
[root@host dir]# time sed -n '/<Name>/p' input | wc -l
1000000

real    0m3.716s
user    0m3.716s
sys     0m0.096s

Verdict: grep seems to be quickest to do this particular task amongst the utilities used above. Crudely extrapolating the results for a file with 1 billion blocks of entries, it should take about 158s or around 3mins.

Last edited by balajesuri; 03-01-2012 at 06:53 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

XML tag replacement from different XML file

We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Discussion started by: balrajg
0 Replies

2. Shell Programming and Scripting

Changing particular tag value of xml file

Hi All, I have number of xml file like : ______________________________________________________ <?xml version="1.0" standalone="no"?> <!-- Created by Symology Ltd on 13/02/2012 - USER_BATCH_ID 0011091684 --> <!-- RECIPIENT_URL: HTTP://194.168.0.81:3408 --> <EToNrequest ... (7 Replies)
Discussion started by: krsnadasa
7 Replies

3. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

4. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

5. Emergency UNIX and Linux Support

Trying to parse a xml file for only one tag

I have a xml file in where I need to parse only a particular tag and print the output in the shell script. Here is the tag info in the xml file <dp:file> This is dp file output </dp:file> Output should be printed as This is dp file output. Please help.Thank you. (5 Replies)
Discussion started by: chandu123
5 Replies

6. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

7. Shell Programming and Scripting

Efficient way to search array in text file by awk

I have one array SPLNO with approx 10k numbers.Now i want to search the subscriber number from MDN.TXT file (containing approx 1.5 lac record)from the array.if subscriber number found in array it will perform below operation.my issue is that it's taking more time because for one number it's search... (6 Replies)
Discussion started by: siramitsharma
6 Replies

8. Shell Programming and Scripting

sed search and replace after xml tag

Hi All, I'm new to sed. In following XML file <interface type='direct'> <mac address='52:54:00:86:ce:f6'/> <source dev='eno1' mode='bridge'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> ... (8 Replies)
Discussion started by: varunrapelly
8 Replies

9. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

10. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies
DOM(3pm)						User Contributed Perl Documentation						  DOM(3pm)

NAME
XML::Generator::DOM - XML::Generator subclass for producing DOM trees instead of strings. SYNOPSIS
use XML::Generator::DOM; my $dg = XML::Generator::DOM->new(); my $doc = $dg->xml($dg->xmlcmnt("Test document."), $dg->foo({'baz' => 'bam'}, 42)); print $doc->toString; yields: <?xml version="1.0" standalone="yes"?> <!--Test document--> <foo baz="bam">42</foo> DESCRIPTION
XML::Generator::DOM subclasses XML::Generator in order to produce DOM trees instead of strings (see XML::Generator and XML::DOM). This module is still experimental and its semantics might change. Essentially, tag methods return XML::DOM::DocumentFragment objects, constructed either from a DOM document passed into the constructor or a default document that XML::Generator::DOM will automatically construct. Calling the xml() method will return this automatically constructed document and cause a fresh one to be constructed for future tag method calls. If you passed in your own document, you may not call the xml() method. Below, we just note the remaining differences in semantics between XML::Generator methods and XML::Generator::DOM methods. CONSTRUCTOR
These configuration options are accepted but have no effect on the semantics of the returned object: escape, pretty, conformance and empty. TAG METHODS
Subsequently, tag method semantics are somewhat different for this module compared to XML::Generator. The primary difference is that tag method return XML::DOM::DocumentFragment objects. Namespace and attribute processing remains the same, but remaining arguments to tag methods must either be text or other XML::DOM::DocumentFragment objects. No escape processing, syntax checking, or output control is done; this is all left up to XML::DOM. SPECIAL TAGS
All special tags are available by default with XML::Generator::DOM; you don't need to use 'conformance' => 'strict'. xmlpi(@args) Arguments will simply be concatenated and passed as the data to the XML::DOM::ProcessingInstruction object that is returned. xmlcmnt Escaping of '--' is done by XML::DOM::Comment, which replaces both hyphens with '&#45;'. An XML::DOM::Comment object is returned. xmldecl Returns an XML::DOM::XMLDecl object. Respects 'version', 'encoding' and 'dtd' settings in the object. xmldecl Returns an XML::DOM::DocumentType object. xmlcdata Returns an XML::DOM::CDATASection object. xml As described above, xml() can only be used when dom_document was not set in the object. The automatically created document will have its XML Declaration set and the arguments to xml() will be appended to it. Then a new DOM document is automatically generated and the old one is returned. This is the only way to get a DOM document from this module. perl v5.12.4 2004-03-23 DOM(3pm)
All times are GMT -4. The time now is 05:28 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy