Visit Our UNIX and Linux User Community


Help in parsing xml file (sed/nawk)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in parsing xml file (sed/nawk)
# 1  
Old 08-11-2011
Help in parsing xml file (sed/nawk)

I have a large xml file as shown below:
Code:
<input>
    <blah>
    <blah>
        <atr="blah blah value = "">
    <blah>
        <blah>
</input>

..2nd chunk...

..3rd chunk...

...4th chunk...

All lines between <input> and </input> is one 'order' and this 'order' is repeated several times, but the first and last line for all the 'orders' are same i.e. <input> and </input>.

I need the entire 'order' containing a string (value=""), i.e. all lines between <input> and </input> containing value="".

Now in the xml, I have many occurrences of value="", I need all 'orders' containing value="" in a separate file.

Restrictions:
1) one 'order' may contain more than one value="", for that I need the order only once in the output file.

I am using solaris.
Thanks for helping.

Last edited by fpmurphy; 08-11-2011 at 10:43 AM..
# 2  
Old 08-11-2011
Code:
 
$ nawk -F"\"" ' /atr/ {print $4}' test.xml | sort | uniq
abc
adfasdfas
dafasfas
dafasfasf

test data :

Code:
 
$ cat test.xml
<input>
<blah>
<blah>
<atr="blah blah" value = "adfasdfas">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "abc">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "dafasfasf">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "abc">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "dafasfas">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "abc">
<blah>
<blah>
</input>


Last edited by fpmurphy; 08-11-2011 at 10:42 AM..
# 3  
Old 08-11-2011
thanks itkamaraj,
but that;s not I needed.


I will explain with your test data (with little changes).
Code:
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "dafasfasf">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "abc">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>

Output should be: All lines between <input> and </input> where value=""
Code:
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>

<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>

<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>


Last edited by fpmurphy; 08-11-2011 at 10:41 AM.. Reason: Code tags please!
# 4  
Old 08-11-2011
Try:
Code:
perl -ln0e 'while (/<input>.*?<\/input>/sg){$x=$&;print "$x\n" if $x=~/atr=\"blah blah\" value = \"\"/}'file.xml

# 5  
Old 08-11-2011
Code:
 
$ nawk 'BEGIN{RS=""; FS="\</input\>"} {for(i=1;i<=NF;i++){ if ($i~/\"\"/) print $i"</input>"}}' test                                              
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>
<input>
<blah>
<blah>
<atr="blah blah" value = "">
<blah>
<blah>
</input>

This User Gave Thanks to itkamaraj For This Post:
# 6  
Old 08-11-2011
thanks itkamaraj.

that was awesome. I also checked for the case when there are two values="" in the same order, and it worked fine..

can you plz explain me a bit how it work, as I need to make some changes.....

Thanks a lot.
# 7  
Old 08-11-2011
Normally awk has record separator as \n and field separator as space. But in the code we are overriding it to record seperator as "" and field seperator as </input>

so each record has the value of <input>..........<blah>

in that record, we are checking $0~/\"\"/ (any record has two double quotes... "" )

if yes, then print it

---------- Post updated at 08:27 PM ---------- Previous update was at 08:27 PM ----------

read more about awk here

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Parsing XML file

I want to parse xml file sample file....... <name locale="en">my_name<>/name><lastChanged>somedate</lastChanged><some more code here> <name locale="en">tablename1<>/name><lastChanged>somedate</lastChanged> <definition><dbquery><sources><sql type="cognos">select * from... (10 Replies)
Discussion started by: ms2001
10 Replies

2. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

3. Shell Programming and Scripting

XML parsing using nawk help needed

i need one help, below is one more xml file with diff pattern i tried it but dint get it , iam sure its a peice of cake for you guys. <xn:MeContext id="LSVLKY001"> <xn:ManagedElement id="1"> <un:RncFunction id="1"> <un:UtranCell... (2 Replies)
Discussion started by: tech_frk
2 Replies

4. Shell Programming and Scripting

Need help parsing data with sed and/or nawk

Good day all. I have the following entries of data in a file in a column, however, I need this data written on a single line with several parameters in a different order. Current format: Treatment ,parmeter1=value ,parmeter2=value ,parmeter3=value ,parmeter4=value... (7 Replies)
Discussion started by: BRH
7 Replies

5. Shell Programming and Scripting

Parsing xml file

hi guys, great help to the original question, can i expand please? i have large files filled with blocks like this <Placemark> network type: hot line1 line2 line3 <styleUrl>red.png</styleUrl> </Placemark> <Placemark> network type: cold line1 line2 line3... (3 Replies)
Discussion started by: garvald
3 Replies

6. Shell Programming and Scripting

how to parse the file in xml format using awk/nawk

Hi All, I have an xml file with the below format. <a>111</a><b>222</b><c>333<c><d><e>123</e><f>234</f><d><e>456</e><f>789</f> output needed is 111,222,333,123,234 111,222,333,456,789 nawk 'BEGIN{FS="<|>"} {print a,b,c,e,f a="" ... (7 Replies)
Discussion started by: natalie23
7 Replies

7. Shell Programming and Scripting

parsing(xml) using nawk/awk

Hi , I have an xml format as shown below: <Info> <last name="sean" first name="john"/> <period="5" time="11"/> <test value="1",test2 value="2",test3 value="3",test4 value="5"> <old> <value1>1</value1> <value2>2</value2> </old> <new> <value1>4</value1> <value2>3</value2> </new>... (1 Reply)
Discussion started by: natalie23
1 Replies

8. Shell Programming and Scripting

parsing xml with awk/sed

Hi people!, I need extract from the file (test-file.txt) the values between <context> and </context> tag's , the total are 7 lines,but i can only get 5 or 2 lines!!:confused: Please look my code: #awk '/context/{flag=1} /\/context/{flag=0} !/context/{ if (flag==1) p rint $0; }'... (3 Replies)
Discussion started by: ricgamch
3 Replies

9. UNIX for Advanced & Expert Users

Parsing xml file using Sed

Hi All, I have this(.xml) file as: <!-- define your instance here --> <instance name='ins_C2Londondev' user='' group='' fullname='B2%20-%20London%20(dev)' > <property> </property> </instance> I want output as: <!-- define your instance here --> <instance... (3 Replies)
Discussion started by: kapilkinha
3 Replies

10. Shell Programming and Scripting

can i do XML parsing usind sed

Hi all... I want to parse a xml filein unix .. Can i use SED or unix script to parse the xml file .. If so can anyone show a sample script that will parse the xml file .. Thanks in advance, Arun ,,,, (3 Replies)
Discussion started by: arunkumar_mca
3 Replies

Featured Tech Videos