Need an efficient way to search for a tag in an xml file having millions of rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need an efficient way to search for a tag in an xml file having millions of rows
# 8  
Old 03-01-2012
I am using AIX and dont have an option to use xpath.

Was trying this command to format the file
Code:
 sed 's/\>\</\>\\n\</g' input

. But this gives an output
Code:
<?xml version="1.0" encoding="UTF-8"?>\n<Root>\n<Person>\n<Name>John</Name>\n</Person>\n<Person>\n<Name>John</Name>\n</Person>\n</Root>

# 9  
Old 03-01-2012
If you escape '\n' in the substitute part \>\\n\<, how do you expect to see a line break!

Code:
$ sed 's/></>\n</g' input
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<Person>
<Name>John</Name>
</Person>
<Person>
<Name>John</Name>
</Person>
</Root>


Last edited by balajesuri; 03-01-2012 at 08:44 AM..
# 10  
Old 03-02-2012
This is what i get from the command you suggested
Code:
<?xml version="1.0" encoding="UTF-8"?>n<Root>n<Person>n<Name>John</Name>n</Person>n<Person>n<Name>John</Name>n</Person>n</Root>

# 11  
Old 03-02-2012
See.. This is what happens when you don't mention which OS and shell you're working on. Solution in post #9 was tried on RHEL GNU bash, sed version 4.1.5.
# 12  
Old 03-02-2012
I did say AIX (post#8) .. seems u missed it but thats ok. Tried tr, sed ,awk .. but none working. Pls see if u can get me a soln.

---------- Post updated at 11:56 PM ---------- Previous update was at 11:53 PM ----------

got it

Code:
 sed "s/></>\\`echo -e '\n\r'`</g" input

thnx all for ur efforts.
# 13  
Old 03-02-2012
Ah, yes! My bad.. Its my mistake. Sorry mate.
# 14  
Old 03-02-2012
You need to use a SAX parser instead of DOM. Here is a python implementation.

Code:
#! /usr/bin/python

import xml.parsers.expat

count=0

def start_element(name, attrs):
        global count
        if name == "Name":
                count+=1

f=open('infile.xml')

p=xml.parsers.expat.ParserCreate()
p.StartElementHandler=start_element
p.ParseFile(f)
print count

Running on my little netbook
Code:
$ wc -l infile.xml
1 infile.xml

$ time ./infile.py
100000

real    0m1.187s
user    0m1.168s
sys     0m0.016s

$ grep 'model name' /proc/cpuinfo
model name      : Intel(R) Atom(TM) CPU N270   @ 1.60GHz
model name      : Intel(R) Atom(TM) CPU N270   @ 1.60GHz

Please let us know how long it take for 1 billion records.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

sed search and replace after xml tag

Hi All, I'm new to sed. In following XML file <interface type='direct'> <mac address='52:54:00:86:ce:f6'/> <source dev='eno1' mode='bridge'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> ... (8 Replies)
Discussion started by: varunrapelly
8 Replies

4. Shell Programming and Scripting

Efficient way to search array in text file by awk

I have one array SPLNO with approx 10k numbers.Now i want to search the subscriber number from MDN.TXT file (containing approx 1.5 lac record)from the array.if subscriber number found in array it will perform below operation.my issue is that it's taking more time because for one number it's search... (6 Replies)
Discussion started by: siramitsharma
6 Replies

5. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

6. Emergency UNIX and Linux Support

Trying to parse a xml file for only one tag

I have a xml file in where I need to parse only a particular tag and print the output in the shell script. Here is the tag info in the xml file <dp:file> This is dp file output </dp:file> Output should be printed as This is dp file output. Please help.Thank you. (5 Replies)
Discussion started by: chandu123
5 Replies

7. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

8. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

9. Shell Programming and Scripting

Changing particular tag value of xml file

Hi All, I have number of xml file like : ______________________________________________________ <?xml version="1.0" standalone="no"?> <!-- Created by Symology Ltd on 13/02/2012 - USER_BATCH_ID 0011091684 --> <!-- RECIPIENT_URL: HTTP://194.168.0.81:3408 --> <EToNrequest ... (7 Replies)
Discussion started by: krsnadasa
7 Replies

10. Shell Programming and Scripting

XML tag replacement from different XML file

We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Discussion started by: balrajg
0 Replies
Login or Register to Ask a Question